Use AWS Lambda Layers to package a Headless Chrome Browser for use in Lambda Functions
February 15, 2021
A common struggle when using Lambda Functions is figuring out best practices for packaging binaries that are not included by default in the Lambda runtime. In this post we’ll take a look at how we can package a binary for headless chrome in a Lambda Layer, which can then be attached to multiple different Lambda Functions, in order to browse a website on a scheduled basis to search for the presence of a DOM element.
SAM Tooling
To simplify our deployment to AWS, we will be using SAM (Serverless Application Model), which is a CloudFormation transform coupled with a CLI for managing building and local simulation of the cloud environment.
What We’re Building
Our goal is to build a Lambda Function that can launch a headless browser (let’s call this Lambda Application sam-browser
) to visit a URL and determine if there is some specific content present. In order to make future Lambda Functions that also need headless browser support, we will include the headless browser binary and Puppeteer module/API’s in a Lambda Layer.
CloudFormation Template
Let’s start by writing our CloudFormation template to define the AWS resources that we will be creating. To declare this template as a SAM Serverless template we must include this header:
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: >
sam-browser
SAM Template for sam-browser Lambda Application
And now let’s list our resources - we’ll need a Lambda Function and a Lambda Layer:
Resources:
ChromePuppeteerLayer:
Type: AWS::Serverless::LayerVersion
Properties:
Description: Serverless Chrome/Puppeteer Layer
ContentUri: layers/puppeteer/
CompatibleRuntimes:
- nodejs14.x
Metadata:
BuildMethod: nodejs14.x
WebsiteFinderFunction:
Type: AWS::Serverless::Function
Properties:
Description: Find Web Page DOM Element Function
CodeUri: functions/website-finder/
Handler: app.lambdaHandler
Timeout: 30
MemorySize: 2048 # Chrome will require higher memory
Runtime: nodejs14.x
Layers:
- !Ref ChromePuppeteerLayer # Attach Our Chrome Layer
Building the Layer
Let’s focus on the Layer first; the BuildMethod
tells SAM that during a build it should attempt to install NPM packages as defined in a package.json
file. For our project we are going to leverage the awesome chrome-aws-lambda project & npm package that ships with the appropriate binary for puppeteer. Let’s include both chrome-aws-lambda
and puppeteer
in our dependencies for the layer:
{
"name": "chrome-puppeteer-layer",
"version": "1.0.0",
"description": "Dependencies for the Chrome/Puppeteer Lambda Layer",
"main": "app.js",
"author": "SAM CLI",
"license": "MIT",
"dependencies": {
"chrome-aws-lambda": "7.0.0",
"puppeteer-core": "7.0.x"
}
}
That package.json
file is all we need for our layer! It will live in the following directory structure, as referenced in our template.yaml
resource:
layers/
puppeteer/
package.json
Consuming the Layer
Let’s build our Lambda Function now that will consume the dependencies included by our Layer. This Node.js code will expect an event of the following structure:
{
"url": "https://hackerrdave.com",
"querySelector": "#home",
"inverse": false
}
Let’s keep a copy of this sample event for later testing, in an events/
directory:
events/
website-finder-event.json
We can pass our function the URL to visit, a query selector to find a DOM element, and we can decide whether we want to care about the presence or absence of that element by providing an inverse
boolean.
const chromium = require('chrome-aws-lambda');
const puppeteer = chromium.puppeteer;
exports.lambdaHandler = async (event, context) => {
const url = event["url"];
const querySelector = event["querySelector"];
const inverse = event["inverse"]
const browser = await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless,
ignoreHTTPSErrors: true,
});
let foundItem = false;
try {
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
const result = await page.$(`${querySelector}`)
if (inverse) {
foundItem = !result;
} else {
foundItem = !!result;
}
} catch(e) {
console.log(e)
} finally {
await browser.close();
if (foundItem) {
console.log(`Found Item: ${querySelector}`);
}
return {
"success": foundItem
}
}
};
The directory structure of our lambda code will be the following, as referenced by the CodeUri in the template.yaml
:
functions/
website-finder/
app.js
That’s all we need to define our Lambda Function and Lambda Layer! We can test this Function out locally by leveraging the SAM cli:
sam build
sam local invoke WebsiteFinder --event events/website-finder-event.json
To deploy both the Lambda Function and Lambda Layer to AWS we can run:
sam deploy --guided
Would recommend using the --guided
flag the first time, so you can get a preview of the CloudFormation modifications prior to accepting.
Optionally, we can configure this lambda to run on a regular schedule by adding a Cloudwatch Event. In the Events section of the Lambda template, we can do:
WebsiteFinderFunction:
Properties:
Events:
CheckSchedule1:
Type: Schedule
Properties:
Description: Check Website Every Minute
Enabled: True
Schedule: "rate(1 minute)" # choose the frequency
Input: '{"url": "https://hackerrdave.com", "querySelector": "#main", "inverse": false}'
If we deploy again, our Lambda Function will now be running every minute with the Input
payload as the event.
That’s it! You can find all the code for this blog post in the sam-browser github repo. The next post will focus on how we can incorporate this Lambda Function to be 1 step in a Step Functions State Machine.