Deploy a Gatsby Static Site to AWS Cloudfront CDN using Lambda@Edge

February 01, 2020

Static site generators like Gatsby.js are growing in popularity, but don’t work out of the box with AWS Amplify/Cloudfront. We’re going to dive into how you can leverage Lambda@Edge to customize your Gastby.js deployment in AWS without requiring the use of a more highly-managed service (like Amplify or Netlify).

Gatsby via Cloudfront - what’s missing?

In the following scenario, we will be working with a Cloudfront distribution whose default behavior’s origin is an S3 bucket that stores the Gastby build artifacts.

Gatsby generates HTML files with the following path structure:

/my-blog-post-name/index.html

We only want to include /my-blog-post-name in the request for the blog post, so how do we tell Cloudfront to fetch the index.html file? With no configurations to the Cloudfront distribution, a request to /my-blog-post-name would result in a 404. Cloudfront distributions do have settings to define error handling behavior, but they are for defining static responses (such as a 404.html file) rather than dynamic routing of the requests - this is where Lambda@Edge comes in.

Lambda@Edge Triggers

We can leverage Lambda@Edge to redirect the incoming request to the Origin (S3 bucket) resource that we want to fetch. AWS describes the benefits of Lambda@Edge as:

With Lambda@Edge, you can enrich your web applications by making them globally distributed and improving their performance — all with zero server administration. Lambda@Edge runs your code in response to events generated by the Amazon CloudFront content delivery network (CDN). Just upload your code to AWS Lambda, which takes care of everything required to run and scale your code with high availability at an AWS location closest to your end user.

You can think of Lambda@Edge as functions that are executing on the Cloudfront edge node that is closest to your end-user, as a way to provide a more feature-rich CDN. We are going to keep it simple in our example, and have our Lambda@Edge function just repoint our resource request to the correct S3 bucket object. The AWS blog has a good summary of the available Lambda@Edge triggers in which they visualize 4 different Cloudfront events on any given request:

Since we are looking to change which resource is fetched from the origin server, we will choose Origin Request as our Lambda@Edge trigger.

Lambda Code - Redirect Origin Resource

Lambda@Edge functions have more restrictions than regular Lambdas and currently only support a few runtimes, including Python 3.7 and Node.js 10.x. We’ll use Node.js 10.x for this example, and keep the logic intentionally simplified.

// Attached to: ORIGIN REQUEST
exports.lambda_handler = (event, context, callback) => {
    // Extract the request from the Cloudfront Origin Request event
    let { request } = event.Records[0].cf;
    
    // If no "." in URI, assume document request and append index.html to request.uri
    if (request.uri.match(/^[^.]*$/)) {
        if (request.uri[request.uri.length - 1] === '/') {
            request.uri += 'index.html';
        } else {
            request.uri += '/index.html';
        }
    }
    // Return to CloudFront Origin Request
    return callback(null, request);
};

This function will examine the request URI and append index.html to the request if the URI doesn’t already have a file extension. The result will be that the request to the Cloudfront domain will be mutated just prior to being forwarded to the S3 bucket so that it grabs the file /my-blog-post-name/index.html.

Lambda Code - Caching Rules

Another use case for Lambda@Edge is to implement the caching rules. Gatsby has specific recommended caching rules that aren’t implemented with Cloudfront/S3 out of the box. We can configure the Cloudfront Domain to rely on the caching rules attached to the S3 bucket objects, but the caching rules can be more consistently applied to the origin by leveraging a Lambda@Edge function triggered by the Origin Response event. That way we can override the caching rules for any object retrieved from the origin, and let Cloudfront know which objects to keep in the cache (avoiding future trips to the origin).

// Attached to: ORIGIN RESPONSE
exports.lambda_handler = (event, context, callback) => {
  // Extract the request and response from the Cloudfront Origin Response event
  let { request, response } = event.Records[0].cf;

  const headerCacheControl = 'Cache-Control';
  const headerContentType = 'Content-Type';
  const defaultTimeToLive = 60 * 60 * 24 * 365; // 365 days
  const fileExts = [
    '.js',
    '.css',
    '.json',
    '.woff',
    '.woff2',
    '.ttf',
    '.otf',
    '.eot',
    '.jpg',
    '.jpeg',
    '.png',
    '.gif',
    '.svg',
    '.ico'
  ];

  if (response.status === '200') {
    if (!response.headers[headerCacheControl.toLowerCase()]) {
      // Only cache above-defined list of file types
      if (fileExts.findIndex((fileExt) => request.uri.endsWith(fileExt)) >= 0) {
        response.headers[headerCacheControl.toLowerCase()] = [{
          key: headerCacheControl,
          value: `public, max-age=${defaultTimeToLive}`,
        }];
      } else {
        response.headers[headerCacheControl.toLowerCase()] = [{
          key: headerCacheControl,
          value: `public, max-age=0`,
        }];
      }
    }
  }
  // Return to Cloudfront Origin Response event
  callback(null, response);
};

In this Lambda code we define all of the object file types that we wish to keep in the cache for a long period (1 year), and all other file types we will not cache at all. Gatsby recommends never caching HTML files, and caching for 1 year the generated assets that have their names uniquely linked to the build. This insures that users will always get the latest pointers to the assets, and won’t involve any manual cache invalidation at either the Cloudfront or Client level.

Cost Comparison with Managed Service

In the next post, we’ll perform a cost comparison between this manual implementation using Cloudfront+Lambda@Edge versus using a managed service such as Netlify.