What you’ll learn
  • what are the main components that can be tuned
  • what things you should consider
  • best practices

About
anchor

Webiny, being built on top of a serverless infrastructure, takes the advantage of the auto-scale capabilities many of the serverless services provide out of the box, saving you time, and stress in having to manage this yourself. However, not all components scale well, and sometimes it’s the interaction between the components that might need adjustments. In this article we want to provide you with a list of things you should consider in advance before putting large workloads on top of Webiny, to ensure your users have a great experience.

Things to Consider
anchor

The main bottleneck we found when we stress-tested Webiny by inserting millions of records with thousands of user requests in parallel, is that it’s the communication between DynamoDB and the OpenSearch cluster that should be tuned to match your traffic patterns. We say patterns because the tuning might differ if you’re focused more on read operations than write operations, as well as the record sizes and frequency, many small-record writes, vs many big-record writes.

In addition to the database side of things, the one additional item to consider is the burst capacity of Lambda functions - which is an AWS hard limit and cannot be changed.

In this article, we’ll focus on tuning the OpenSearch cluster, the DynamoDB Stream configuration and the Lambda burst capacity. These are the only components you should tweak, while all others will handle and scale to your loads automatically.

Lambda Burst Capacity
anchor

It’s important to get yourself familiarized with how Lambda functions scale. There are two parts to how they scale, the first is the “burst capacity”. That’s the most important one as that one cannot be changed and has a fixed value that depends on the AWS region where you deployed Webiny. The burst capacity determines the maximum amount of concurrent Lambda function executions for your initial period. That’s like your 0 to 60 speed.

The key here is to pick the right AWS region as the burst capacity differs drastically between some regions. Here’s the list of the current burst capacity values per AWS region:

  • 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland)
  • 1000 – Asia Pacific (Tokyo), Europe (Frankfurt), US East (Ohio)
  • 500 – Other Regions

Once you are at the burst capacity, your functions will still scale in their concurrency, but at a slower pace, which is 500 additional instances each minute. This is why it’s important to pick a region with a higher burst capacity so that you have more room till you hit this 500 instances per minute threshold.

The last important factor to mention about scaling Lambda functions is that every AWS account gets a limit of 1,000 concurrent Lambda executions by default. This is counted across all of your Lambda functions inside a single AWS region. This limit can be increased by raising a request inside your AWS Service Quotas Dashboard external link

References:

DynamoDB Stream
anchor

By default, in Webiny, all the API write requests only hit DynamoDB, regardless if you’re using the DynamoDB-only instance, or the one with OpenSearch. This is because OpenSearch doesn’t scale well when there’s a big spike in write requests. Using DynamoDB to shield the OpenSearch cluster is a design decision that helps us scale Webiny to very large amounts of traffic.

The data records, once written inside DynamoDB, get copied to OpenSearch in the background through a DynamoDB Stream. Essentially DynamoDB will trigger a Lambda function and give it several last records that were written to DynamoDB, and then that Lambda function does a bulk-write request to OpenSearch.

This mechanism triggers with a delay, which is intentional. The delay is determined by configuring two parameters on DynamoDB Stream:

  • Batch size – The number of records to send to the function in each batch, up to 10,000. Lambda passes all of the records in the batch to the function in a single call, as long as the total size of the events doesn’t exceed the payload limit for synchronous invocation (6 MB).
  • Batch window – Specify the maximum amount of time to gather records before invoking the function, in seconds.

Whichever of the two parameters meets its condition first, will trigger the Stream and thus pass the data to the Lambda function which will sync it to OpenSearch.

The thing to consider here is how fast you need to read these records after they’ve been written. If you have low amounts of reads, you might need to wait until the Batch window expires before the sync process kicks in. On the other side, if you have a small Batch size, but a lot of writes, the sync process might kick in too many times and overwhelm your OpenSearch cluster. Tweaking this will depend on your traffic patterns and requirements.

By default, Webiny configures Batch size to 1,000 records and the Batch window to 1 second. For most use cases this is enough. In case you wish to modify this, here’s an example code:

apps/core/webiny.application.ts

import * as aws from "@pulumi/aws";
import { createCoreApp } from "@webiny/serverless-cms-aws";
import { isResourceOfType } from "@webiny/pulumi";

export default createCoreApp({
  pulumi: app => {
    app.onResource(resource => {
      if (isResourceOfType(resource, aws.lambda.EventSourceMapping) {
        //change the batch size and window
        resource.config.batchSize(500);
        resource.config.maximumBatchingWindowInSeconds(30);
      }
    });
  }
});

References:

OpenSearch Cluster
anchor

Finally, the last component to tune is the OpenSearch cluster. The first part is to correctly size the cluster. For production instances, we highly recommend having a multi-zone deployment. By default, if you deploy Webiny with --env=prod flag, Webiny will automatically create a cluster in 2 availability zones, where each zone has 1 instance that’s a type of t3.medium.search, this should be suitable for small to medium size projects. For all other environments, we deploy a single t3.small.elasticsearch instance, which is only suitable for development purposes and shouldn’t be used in production.

For any larger projects, we recommend you scale up your OpenSearch cluster to something with more capacity. You can look at our Headless CMS benchmark where we compare several different cluster configurations to get a sense of what might best suit your needs.

To update your OpenSearch cluster configuration, you can follow this example:

apps/core/webiny.application.ts
import * as aws from "@pulumi/aws";
import { createCoreApp } from "@webiny/serverless-cms-aws";
import { isResourceOfType } from "@webiny/pulumi";

export default createCoreApp({
  elasticSearch: true,

  pulumi: app => {
    app.onResource(resource => {
      if (isResourceOfType(resource, aws.elasticsearch.Domain)) {
        // Set the instance type and number of instances 
        resource.config.clusterConfig(() => {
          return {
            instanceType: "t3.small.elasticsearch"
            instanceCount: 2
          };
        });

        // Set Elasticsearch (OpenSearch) version.
        resource.config.elasticsearchVersion("7.7");

        // Change advanced options.
        resource.config.advancedOptions({
          override_main_response_version: "false",
          "rest.action.multi.allow_explicit_index": "true"
        });

        resource.opts.ignoreChanges = ["advancedOptions", "tags"];
      }
    });
  }
});

The second part of tuning the OpenSearch is tweaking certain configuration parameters of the service itself. Here are a couple of parameters we recommend you adjust:

Increase the refresh interval Refresh is an operation that indexes the records and makes them available for searching. By default, this action is triggered every second on all records that received one search request in the last 30 seconds. This action is very costly, especially if you have ongoing indexing activities running on your cluster and it will hurt your indexing speed. We recommend increasing this to 30s and then testing if you need to increase it more or decrease it. To make the change you need to modify index.refresh_interval parameter.

// please check if this is the right way?

PUT https://domain-name.region.es.amazonaws.com/source-index/_settings
{
  "settings": {
    "index.refresh_interval": "30s", 
  }

Increase indexing buffer size The indexing buffer size is set to be 10% of the memory of your instance size. Meaning, if you have an instance with 10GB of RAM, the indexing buffer size will be 1GB. This is usually enough. However, in cases where you’re importing large amounts of data into Webiny, you might want to temporarily increase this to something like 20% by adjusting the indices.memory.index_buffer_size parameter.

// not sure how to do this??

References: