loader

Improvement of IT infrastructure’s availability: Amazon Elastic Container Service (ECS)

ABOUT THE PROJECT

A company in the Marketing and Advertising industry is looking for improvement of its infrastructure’s availability, making it more reliable, reducing outages and performance issues.
Main concerns of the company are high availability and autoscaling: The ability to rapidly and consistently build, configure and deploy enables two extremely important patterns: High Availability and Autoscaling.
• High Availability (HA) means that an application is operating in such a way that it can withstand severe conditions that might bring down an application not built with HA in mind.
• Autoscaling is all about efficiency. Autoscaling is a mechanism by which a set of application nodes can be horizontally scaled up and down, in realtime, in response to a particular stimulus.
The HA and autoscaling patterns greatly improve the application experience of end-users by making applications more reliable, operators by reducing outages and performance issue, and to application owners by greatly reducing costs.

THE CHALLENGE

• Cron jobs: The software utility cron is a time-based job scheduler in Unix-like computer operating systems. Users that set up and maintain software environments use cron to schedule jobs to run periodically at fixed times, dates, or intervals. It is a utility which schedules a command or script on your server to run automatically at a specified time and date. A cron job is the scheduled task itself. Cron jobs can be very useful to automate repetitive tasks.
The company needed reports about statistics, delivered via email. Basically, it is a simple SQL query gathering the details from a DB and sending them via mail to the client. CloudWatch cron jobs are used for that purpose, they run as cron job schedule on specific periods, which starts an Elastic Cloud Storage (ECS) task. There are plans this to be replaced by lambda functions triggered by a message in Simple Notification Service (SNS) topic. All logs from the cronjobs are sent to central Graylog host, where are stored in case something needs to be checked.
• Dynamic multiple sites:
As their customers are changing, the company needs to create fast multiple different sites, basically the same base image with different code on top customized. Via a separate Jenkins pipelines and Terraform this is easily accomplished. It is a matter of a single click in order to spin up a new site for a customer.
• Monitoring:
 Icinga monitoring is implemented to monitor dynamically the ECS cluster. Even when the cluster scales up/down Icinga is updated accordingly with the new hosts or hosts removal. Additionally, a URL monitoring is present for the most critical sites.

THE SOLUTION

• ECS , Terraform, Jenkins
ECS is a fully managed container orchestration service. It is a good fit because of its security, reliability, and scalability. Additionally, it can natively integrate with other services such as Amazon Route 53, Secrets Manager, AWS Identity and Access Management (IAM), and Amazon CloudWatch providing with a familiar experience to deploy and scale the containers.
ECS clusters are used for all of the environments the company has: one cluster for testing, and one for production.
Everything is created/updated via Terraform scripts. In theory every infrastructure environment could be re-created just with one Jenkins pipeline. The tool for Continuous Integration and Continuous Deployment (CI/CD) is Jenkins, each developer is responsible to build the image with the code he is working on. For ECS the crucial tasks are using autoscaling, additionally, there is also autoscaling enabled on the EC2 instances so there is always place where to create those tasks in EC2.
• PoC for Lambda
 AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security.
The idea is to have the CloudWatch crons triggered on event, as well as run other custom tasks. The first phase of prove of concept was successfully passed. The general set up is: A message is posted in a SNS topic -> SQS which listens for messages on that topic triggers the lambda function, in case something fails a retry topic is available which could be monitored in case messages start to pile up there. Using php code in a Lambda function is a bit of challenge as it is not natively supported, but with custom libraries it is doable. With Lambda functions the monthly costs should additionally lower the charges.

THE CONCLUSION

Thanks to the chosen combination of technologies the customer demands are met. ECS provides high availability and autoscaling. And with Jenkins and Terraform easy provision of new setup on demand.

Extremely knowledgeable in AWS technology. The team I am working with is very honest and realistic in their project estimations. They have proven to be very responsive in monitored alerts and are quick at solving problems. Collaboration and transparency on project hurdles\solutions has also been a very positive experience and am never left feeling that a decision was made based on personal bias but what is best for our business. I have worked with multiple outsourcing groups and would rank ITGIX at the top of my list!