Dave Kerr Software

Use AWS Lambda Layers to package a Headless Chrome Browser for use in Lambda Functions

February 15, 2021

A common struggle when using Lambda Functions is figuring out best practices for packaging binaries that are not included by default in the Lambda runtime. In this post we’ll take a look at how we can package a binary for headless chrome in a Lambda Layer, which can then be attached to multiple different Lambda Functions, in order to browse a website on a scheduled basis to search for the presence of a DOM element.

SAM Tooling

To simplify our deployment to AWS, we will be using SAM (Serverless Application Model), which is a CloudFormation transform coupled with a CLI for managing building and local simulation of the cloud environment.

What We’re Building

Our goal is to build a Lambda Function that can launch a headless browser (let’s call this Lambda Application sam-browser) to visit a URL and determine if there is some specific content present. In order to make future Lambda Functions that also need headless browser support, we will include the headless browser binary and Puppeteer module/API’s in a Lambda Layer.

CloudFormation Template

Let’s start by writing our CloudFormation template to define the AWS resources that we will be creating. To declare this template as a SAM Serverless template we must include this header:

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: >
  sam-browser

  SAM Template for sam-browser Lambda Application

And now let’s list our resources - we’ll need a Lambda Function and a Lambda Layer:

Resources:
  ChromePuppeteerLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      Description: Serverless Chrome/Puppeteer Layer
      ContentUri: layers/puppeteer/
      CompatibleRuntimes:
        - nodejs14.x
    Metadata:
      BuildMethod: nodejs14.x

  WebsiteFinderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: Find Web Page DOM Element Function
      CodeUri: functions/website-finder/
      Handler: app.lambdaHandler
      Timeout: 30
      MemorySize: 2048 # Chrome will require higher memory
      Runtime: nodejs14.x
      Layers:
        - !Ref ChromePuppeteerLayer # Attach Our Chrome Layer

Building the Layer

Let’s focus on the Layer first; the BuildMethod tells SAM that during a build it should attempt to install NPM packages as defined in a package.json file. For our project we are going to leverage the awesome chrome-aws-lambda project & npm package that ships with the appropriate binary for puppeteer. Let’s include both chrome-aws-lambda and puppeteer in our dependencies for the layer:

{
  "name": "chrome-puppeteer-layer",
  "version": "1.0.0",
  "description": "Dependencies for the Chrome/Puppeteer Lambda Layer",
  "main": "app.js",
  "author": "SAM CLI",
  "license": "MIT",
  "dependencies": {
    "chrome-aws-lambda": "7.0.0",
    "puppeteer-core": "7.0.x"
  }
}

That package.json file is all we need for our layer! It will live in the following directory structure, as referenced in our template.yaml resource:

layers/
  puppeteer/
    package.json

Consuming the Layer

Let’s build our Lambda Function now that will consume the dependencies included by our Layer. This Node.js code will expect an event of the following structure:

{
    "url": "https://hackerrdave.com",
    "querySelector": "#home",
    "inverse": false
}

Let’s keep a copy of this sample event for later testing, in an events/ directory:

events/
  website-finder-event.json

We can pass our function the URL to visit, a query selector to find a DOM element, and we can decide whether we want to care about the presence or absence of that element by providing an inverse boolean.

const chromium = require('chrome-aws-lambda'); 
const puppeteer = chromium.puppeteer;

exports.lambdaHandler = async (event, context) => {
    const url = event["url"];
    const querySelector = event["querySelector"];
    const inverse = event["inverse"]

    const browser = await puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless,
      ignoreHTTPSErrors: true,
    });
    
    let foundItem = false;
    
    try {
        const page = await browser.newPage();
        await page.goto(url, { waitUntil: 'networkidle2' });

        const result = await page.$(`${querySelector}`)

        if (inverse) {
            foundItem = !result;
        } else {
            foundItem = !!result;
        }
    } catch(e) {
        console.log(e)
    } finally {
        await browser.close();

        if (foundItem) {
            console.log(`Found Item: ${querySelector}`);
        }
        
        return {
            "success": foundItem
        }
    }
};

The directory structure of our lambda code will be the following, as referenced by the CodeUri in the template.yaml:

functions/
  website-finder/
    app.js

That’s all we need to define our Lambda Function and Lambda Layer! We can test this Function out locally by leveraging the SAM cli:

sam build
sam local invoke WebsiteFinder --event events/website-finder-event.json

To deploy both the Lambda Function and Lambda Layer to AWS we can run:

sam deploy --guided

Would recommend using the --guided flag the first time, so you can get a preview of the CloudFormation modifications prior to accepting.

Optionally, we can configure this lambda to run on a regular schedule by adding a Cloudwatch Event. In the Events section of the Lambda template, we can do:

WebsiteFinderFunction:
  Properties:
    Events:
      CheckSchedule1:
        Type: Schedule
        Properties:
          Description: Check Website Every Minute
          Enabled: True
          Schedule: "rate(1 minute)" # choose the frequency
          Input: '{"url": "https://hackerrdave.com", "querySelector": "#main", "inverse": false}'

If we deploy again, our Lambda Function will now be running every minute with the Input payload as the event.

That’s it! You can find all the code for this blog post in the sam-browser github repo. The next post will focus on how we can incorporate this Lambda Function to be 1 step in a Step Functions State Machine.