Case Study

Custom Horizontal Scaling for WebSocket in AWS

Picture of ITGix Team
ITGix Team
Passionate DevOps & Cloud Engineers
02.04.2024
Reading time: 11 mins.
Last Updated: 02.04.2024

Table of Contents

As a dedicated DevOps consulting company, our unwavering commitment is to deliver optimal solutions to our esteemed clients. Today, we are delighted to showcase a remarkable use case that demonstrates our collaboration with a client in deploying, optimizing, and scaling WebSocket-enabled servers.

The WebSocket protocol is a stateful, full-duplex protocol used for client-server communication. The connection between the client and the server will be kept alive until it’s terminated by either the server or the client. Once the connection is terminated by either the client or the server it will be terminated on both ends. For example, we have for a client a browser, which requests the server, after that, the server sends the handshake and creates the new connection which will be kept alive until either the server or the browser terminates the connection.

ProtocolHTTPWebSocket
TypeRequest-ResponseFull-Duplex
UsageStatelessStateful
OverheadHigherLower
HeaderHeavy (for requests)(for responses)Light (small)
Real-timeNot suitable for real-timeSuitable for real-time applications
Use CasesWeb browsing, REST, APIs, one-way data transferReal-time apps, online games, chat applications

Due to the way the WebSocket protocol works, it generates a lot of connections to the servers, and all servers can handle a fine number of connections before reaching their limit. In our use case when we reached this number of connections the servers didn’t have either high CPU utilization or high Memory utilization, so we needed to look into different ways to scale out the servers. 

We decided that the best solution would be to scale out the servers based on the average number of connections that they have. Having decided on the method of scaling we needed to start implementing and testing the scaling method. 

For the implementation, we needed to take the following steps:

  1. A custom metric to be sent to CloudWatch with the average number of connections of the servers in the autoscaling group.
  2. Review Server, ALB, and Target Group Configuration for adjustments.
  3. Load Tests to determine the number of connections that each server can handle. 
  4. Creating the custom autoscaling policy with CloudWarch alarms.
  5. Load tests to confirm that the custom autoscaling policy is scaling correctly 

AWS’s CloudWatch agent does have a custom metric that tracks the number of connections. The metric name is tcp_established. In the CloudWacth configuration file(amazon-cloudwatch-agent.json) you can add the following sample:

{
        "agent": {
                "metrics_collection_interval": 60
        },
        "metrics": {
                "namespace": "TPC_Connections",
                "append_dimensions": {
                        "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
                        "InstanceId": "${aws:InstanceId}"
                },
                "aggregation_dimensions": [
                        [
                                "AutoScalingGroupName"
                        ]
                ],
                        "netstat": {
                                "measurement": [
                                        {
                                                "name": "tcp_established",
                                                "rename": "TCPconnections",
                                                "unit": "Count"
                                        }
                                ],
                                "metrics_collection_interval": 60
                }
        }
}

After that restart the CloudWatch agent. If the metric is sent correctly you should see a similar graph:

restart the CloudWatch agent.

After the custom metric is ready, we need to review the configurations of our Servers, Application Load Balancer, and the Target Group. 

  1. Let’s start with the Server Configurations: 

Increase the number of open files in the system. Linux has a default limit on the maximum number of open files: 

ulimit -n

1024

You can change the maximum number of open files by editing limits.conf file located in /etc/security/limits.d. At the end of the file you can add the new limits, for example: 

* soft nofile 1048576

* hard nofile 1048576

After that restart your session and you will see the new limit:

ulimit -n

1048576

The next server configuration that you should adjust is the number of ports. Create a new file in /etc/sysctl.d/ called net.ipv4.ip_local_port_range.conf and add the following: 

net.ipv4.ip_local_port_range = 10000 65535

We use 10000 as a lower limit, to avoid including the ports that are already in use. 

And lastly enable keep alive on the webserver. In our case, we use Nginx, so we added the following to the nginx configuration: 

keepalive_timeout 65;
  1. ALB Configuration: 

The only thing that needs to be adjusted here is the Idle session timeout, keep in mind to set the keep alive of the Nginx a bit higher than the Idle timeout to make sure that the Nginx won’t close the connections before the ALB. 

During our load tests, we didn’t notice a significant difference in the behavior of the application when the Idle session was higher than 60 seconds, so we left the default configuration and just adjusted the keep alive setting of the Nginx.

  1. Target Group Configuration

First we need to choose the method that will be used for load balancing the connections. To ensure that the server will receive a similar amount of connections we decided to use the Round Robin load balancing method. 

After that, we will need to create a health check, which will keep track of the number of connections on each server and mark the server as unhealthy when it reaches its maximum connections, the threshold of which will be determined later during the tests.  

The last adjustment of the target group was to enable stickiness. Sticky sessions are perfect for the WebSocket protocol, as they ensure that each request from the user will be sent to the same target however, you do need to keep in mind that to use sticky sessions you also need to use cookies, in our case we decided to use Load balancer generated cookie instead of application cookies. 

After the above adjustments were applied we started with the first batch of load tests.

We used our first load tests to determine how many connections can one server handle. We used AWS’s Distributed Load Testing or the load tests. The developers of the application wrote the tests. We used the following method for testing: 

  • Start the test
  • Monitor the number of connections on the server using sudo netstat -apn | wc -l 
  • At the same time, we were also checking the application to make sure that it was working correctly.

There are a few things to note here, In the first test that we ran we set a limit of 18,000 connections in the health check, to monitor the behavior of the server when it’s marked unhealthy by the target group. Because of this, we noticed that the ALB continues to send traffic to the unhealthy instances, which is a default behavior of ALB. 

Once we noticed the above behavior we scaled to two instances, so we could test the health check and make sure that traffic won’t be sent to that instance once it’s marked unhealthy. 

When we confirmed that traffic was not sent to an instance that is marked unhealthy we started increasing the threshold of the health check. First to 20,000 connections and we continued to monitor the application, which was working perfectly fine. After a few more gradual increases we were able to set the limit of the number of connections to 28,000 per instance. 

Using what we found in our load tests and the custom metric TCPconnections that we set up above, we can now create the custom CloudWatch alarms and autoscaling policies that we are going to use for scaling out and scaling in. 

First, you need to create the CloudWatch alarms as they will be needed for the autoscaling policies. The CloudWatch alarms are called: 

  • High-tcp-connections
  • Low-tcp-connections 

The alarms will be used in the two policies that we will create next. We named the two policies: 

  • low_tcp_connections_policy
  • high_tcp_connections_policy 
  1. Low_tcp_connections_policy:

The low TCP connections policy is used for scaling in and we are using simple scaling. The policy removes one instance when the average TCP connections of all instances in the ASG is equal to or lower than 9500 for 3 consecutive checks of periods of 300 seconds or 5 minutes. So if for 15 minutes the number of connections is lower than 9500 we remove one instance. We decided to use a longer period, because of the inconsistent traffic that we have and the spikes that we observed, which caused scaling in and a few minutes later scaling out. This constant scaling in and scaling out was a problem for us because the remaining instances received the TCP connections drained from the removed instances, which made them reach their limit of maximum connections, which caused latency in the application.

  1. High_tcp_connections_policy:

The high TCP connections policy is used for scaling out. There we decided to use step scaling instead of simple scaling so that we can keep up with the traffic spikes. One thing to note for step scaling is that it uses a lower and upper bound. When the threshold of the alarm is breached the alarm checks the lower and upper bound of the steps scaling to know which step should be executed. → https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html#as-scaling-steps 

This also means that the lower bound of the first step will start from the threshold of the alarm, in our case the threshold is 20 000, so the steps look as follows:

  • Add two instances when the average connections are between 20 000 and  25 000
  • Add two instances when the average connections are between 25 000 and  30 000
  • Add three instances when the average connections are between 30 000 and infinity 

We haven’t had the case to hit the second step, however, if you have larger spikes in traffic it’s possible to scale on all steps. One note here, the third step is added only to make sure that we scale fast enough in the event of overloading the instances. 

  1. Terraform Code for the alarms and policies: 
  1. Terraform Code for the alarms:
locals {

  create_high_tcp_connections = var.cloudwatch_ec2_checks_enabled || var.create_high_tcp_connections

  create_low_tcp_connections  = var.cloudwatch_ec2_checks_enabled || var.create_low_tcp_connections

}

#######################################################

#################High TCP connections##################

#######################################################

resource "aws_cloudwatch_metric_alarm" "high_tcp_connections" {

  count               = local.create_high_tcp_connections ? 1 : 0

  alarm_name          = "${var.env}-${var.application_name}-high-tcp-connections"

  comparison_operator = "GreaterThanOrEqualToThreshold"

  evaluation_periods  = var.high_tcp_evaluation_periods //"5"

  datapoints_to_alarm = var.high_tcp_datapoint_to_alarm //"5"

  metric_name         = "TCPconnections"

  namespace           = "ASG_Memory"

  period              = var.high_tcp_period //"60"

  statistic           = "Average"

  threshold           = var.high_tcp_threshold //"60"

  dimensions = {

    AutoScalingGroupName = aws_autoscaling_group.application_asg.0.name

  }

  alarm_description  = "Trigger an alert when AutoScaling Group instances ${aws_autoscaling_group.application_asg.0.name} have reached high TCP connections"

  alarm_actions      = [var.internal_sns_topic, aws_autoscaling_policy.high_tcp_connections_policy[count.index].arn]

  ok_actions         = [var.internal_sns_topic]

  treat_missing_data = "breaching"

  depends_on = [

    aws_autoscaling_policy.high_tcp_connections_policy,

  ]

}

#######################################################

#################Low TCP connections###################

#######################################################

resource "aws_cloudwatch_metric_alarm" "low_tcp_connections" {

  count               = local.create_low_tcp_connections ? 1 : 0

  alarm_name          = "${var.env}-${var.application_name}-low-tcp-connections"

  comparison_operator = "LessThanOrEqualToThreshold"

  evaluation_periods  = var.low_tcp_evaluation_periods //"5"

  datapoints_to_alarm = var.low_tcp_datapoint_to_alarm //"5"

  metric_name         = "TCPconnections"

  namespace           = "ASG_Memory"

  period              = var.low_tcp_period //"60"

  statistic           = "Average"

  threshold           = var.low_tcp_threshold //"60"

  dimensions = {

    AutoScalingGroupName = aws_autoscaling_group.application_asg.0.name

  }

  alarm_description  = "Trigger an alert when AutoScaling Group instances ${aws_autoscaling_group.application_asg.0.name} have reached low TCP connections"

  alarm_actions      = [var.internal_sns_topic, aws_autoscaling_policy.low_tcp_connections_policy[count.index].arn]

  ok_actions         = [var.internal_sns_topic]

  treat_missing_data = "breaching"

  depends_on = [

    aws_autoscaling_policy.low_tcp_connections_policy,

  ]

}
  1. Terraform code for autoscaling policies: 
resource "aws_autoscaling_policy" "high_tcp_connections_policy" {

  count                     = var.high_tcp_policy_enabled ? 1 : 0

  name                      = "high_tcp_connections_policy"

  policy_type               = "StepScaling"

  adjustment_type           = "ChangeInCapacity"

  estimated_instance_warmup = 150 // default 300

  autoscaling_group_name    = aws_autoscaling_group.application_asg[count.index].name

  step_adjustment {

    scaling_adjustment          = var.scaling_adjustment_1

    metric_interval_lower_bound = var.metric_interval_lower_1

    metric_interval_upper_bound = var.metric_interval_upper_1

  }

  step_adjustment {

    scaling_adjustment          = var.scaling_adjustment_2

    metric_interval_lower_bound = var.metric_interval_lower_2

    metric_interval_upper_bound = var.metric_interval_upper_2

  }

  step_adjustment {

    scaling_adjustment          = var.scaling_adjustment_3

    metric_interval_lower_bound = var.metric_interval_lower_3

    metric_interval_upper_bound = var.metric_interval_upper_3

  }

}

resource "aws_autoscaling_policy" "low_tcp_connections_policy" {

  count                  = var.low_tcp_policy_enabled ? 1 : 0

  name                   = "low_tcp_connections_policy"

  policy_type            = "SimpleScaling"

  adjustment_type        = "ChangeInCapacity"

  scaling_adjustment     = var.low_tcp_scaling_adjustment

  autoscaling_group_name = aws_autoscaling_group.application_asg[count.index].name

}

After the custom scaling policies were created, we need to perform load tests again to confirm that the policies are scaling correctly. 

The default algorithm used by ALBs is the round-robin algorithm, which is the one that we used. So we need to also test if the connections are distributed evenly. For that, we performed another load test and used the following script to monitor the number of connections in each server in the autoscaling group: 

import boto3

import paramiko

def get_autoscaling_group_instances_private_ips(autoscaling_group_name, region='us-east-1'):

    # Create an EC2 client

    ec2_client = boto3.client('ec2', region_name=region)

    # Create an Auto Scaling client

    autoscaling_client = boto3.client('autoscaling', region_name=region)

    # Get the Auto Scaling Group details

    response = autoscaling_client.describe_auto_scaling_groups(AutoScalingGroupNames=[autoscaling_group_name])

    if 'AutoScalingGroups' not in response or not response['AutoScalingGroups']:

        print(f"Auto Scaling Group '{autoscaling_group_name}' not found.")

        return []

    # Extract instance IDs from the Auto Scaling Group

    instance_ids = [instance['InstanceId'] for instance in response['AutoScalingGroups'][0]['Instances']]

    # Get the private IP addresses of instances

    response = ec2_client.describe_instances(InstanceIds=instance_ids)

    private_ips = []

    for reservation in response['Reservations']:

        for instance in reservation['Instances']:

            private_ips.append(instance['PrivateIpAddress'])

    return private_ips

def ssh_and_run_command(instance_ip, username, key_path, command):

    # Connect to the instance using paramiko

    client = paramiko.SSHClient()

    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())

    # Load the private key file

    private_key = paramiko.RSAKey(filename=key_path)

    try:

        client.connect(instance_ip, username=username, pkey=private_key)

        # Run the command

        stdin, stdout, stderr = client.exec_command(command)

        print(f"Output from {instance_ip}:")

        print(stdout.read().decode('utf-8'))

    except Exception as e:

        print(f"Error connecting to {instance_ip}: {e}")

    finally:

        # Close the SSH connection

        client.close()

# Example usage

autoscaling_group_name = 'AUTOSCALING GROUP'

region = 'REGION'

username = 'USERNAME'  

key_path = 'KEY' 

command_to_run = 'sudo netstat -apn | wc -l'

instance_private_ips = get_autoscaling_group_instances_private_ips(autoscaling_group_name, region)

for private_ip in instance_private_ips:

    ssh_and_run_command(private_ip, username, key_path, command_to_run)

After the script is completed successfully you should see the following:

Output from IP Address:

27908

Output from IP Address:

27784

Output from IP Address:

25366

If there is an error this out will be “Error connecting to IP Address”.

This was the final test that we performed before deploying the solution. One different approach that can be tested is to switch the algorithm from round-robin to least outstanding requests, this can potentially be useful during the scaling events in which a lot of new servers are being added to the autoscaling group. 

After we started using the solution in production, we noticed that when we have just one or two instances in the autoscaling group when the traffic starts to increase the autoscaling policy is struggling to scale fast enough, so we implemented two solutions. 

  • The first was to create a Lambda which adds additional instances to the autoscaling group before the increase in traffic, in our case the traffic is predictable, we know when there are spikes and when the application is no longer heavily used, however in case that the traffic is not easily predictable, this can be harder to resolve. The lambda was later switched to scheduled actions that are executed directly from the autoscaling group.
  • The second was to add warm pool instances to the autoscaling group. The wark pool keeps a certain number of instances stopped, when a scaling event is triggered the autoscaling group is only starting the instances, which speeds up the scaling. 

We also need to adjust the scaling thresholds, due to an increase in traffic or spikes.

TCP connections not being closed, which can potentially keep a lot of instances running, when you don’t need them. 

In conclusion, the custom solution works perfectly for what it was designed for and the case in which is used, in which the traffic is known well enough and traffic spikes can be easily predicted. In cases in which the traffic is not predictable, there might be needed further adjustments to the autoscaling policies, thresholds, warm pools, etc. 

Leave a Reply

Your email address will not be published. Required fields are marked *

More Case Studies

Initial Challenge The customer, a security-focused organization, faced a daunting challenge: managing and addressing over 2,000 security vulnerabilities and risks within their infrastructure. As a company committed to upholding the...
Reading
Introduction Running a multi-tenant web application on Kubernetes brings numerous benefits but also introduces complexities, especially when implementing SSL termination and rate limiting. This guide walks you through advanced troubleshooting...
Reading

Blog Posts

Get In Touch
ITGix provides you with expert consultancy and tailored DevOps services to accelerate your business growth.
Newsletter for
Tech Experts
Join 12,000+ business leaders, designers, and developers who receive blogs, e-Books, and case studies on emerging technology.