Blog

Kubernetes Backup and Restore for Elasticsearch 7.17 in S3: Tutorial and tips for DevOps Teams

Picture of ITGix Team
ITGix Team
Passionate DevOps & Cloud Engineers
17.03.2023
Reading time: 5 mins.
Last Updated: 12.02.2024

Table of Contents

As an experienced DevOps consulting services company, we understand the importance of Elasticsearch in powering mission-critical applications and services. Users may store, search, and analyze massive amounts of data in real time using Elasticsearch, an open-source distributed search and analytics engine. Built on top of the Lucene search engine library, Elasticsearch is designed to be scalable, resilient, and highly available.

However, with increased reliance on Elasticsearch, it’s essential to have robust backup and restore procedures in place. In a DevOps context, where continuous delivery and continuous deployment are key principles, having reliable backups of Elasticsearch data is critical for efficient recovery from data loss or system failure. This helps minimize downtime and data loss while contributing to faster recovery and improved business continuity.

Today’s tutorial will demonstrate how to back up and restore your Elasticsearch using AWS S3 as the snapshot repository. However, Elasticsearch does not come with the S3 option enabled by default. In order to allow for this feature, we need to install the S3 plugin directly into the Elasticsearch Docker image and then deploy it in Kubernetes using the official Elasticsearch helm chart.

Prerequisites

Before building the Dockerfile, there are some prerequisites that we need to fulfill. We require an AWS IAM User with Access Keys that can be injected as environment variables into the docker image. It is important to follow the best practices for IAM and set the least-privilege permissions, assigning only the S3 Permissions for the S3 bucket where you intend to store the snapshots. You also need to have an S3 bucket set up.

This is an example of what our Dockerfile will look like:

# Set the Elasticsearch version as a build argument
ARG elasticsearch_version

# Use the official Elasticsearch Docker image as the base
FROM docker.elastic.co/elasticsearch/elasticsearch:${elasticsearch_version}

# Define the AWS access key ID and secret access key as build arguments
ARG ENV_VAR_AWS_ACCESS_KEY_ID
ARG ENV_VAR_AWS_SECRET_ACCESS_KEY

# Set the AWS access key ID and secret access key as environment variables
ENV AWS_ACCESS_KEY_ID=${ENV_VAR_AWS_ACCESS_KEY_ID}
ENV AWS_SECRET_ACCESS_KEY=${ENV_VAR_AWS_SECRET_ACCESS_KEY}

# Install the S3 repository plugin and create a new Elasticsearch keystore
RUN bin/elasticsearch-plugin install --batch repository-s3
RUN /usr/share/elasticsearch/bin/elasticsearch-keystore create

# Add the AWS access key ID and secret access key to the Elasticsearch keystore
RUN echo $AWS_ACCESS_KEY_ID | /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin s3.client.default.access_key
RUN echo $AWS_SECRET_ACCESS_KEY | /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin s3.client.default.secret_key

This Dockerfile sets the Elasticsearch version as a build argument and uses the official Elasticsearch Docker image as the base. It then defines the AWS access key ID and secret access key as build arguments and sets them as environment variables. The S3 repository plugin is installed and a new Elasticsearch keystore is created. Finally, the AWS access key ID and secret access key are added to the Elasticsearch keystore.

To build the Docker image from the Dockerfile, run the following command:

docker build -t elasticsearch:7.17.3 \
--build-arg ENV_VAR_AWS_ACCESS_KEY_ID={Your access key} \
--build-arg ENV_VAR_AWS_SECRET_ACCESS_KEY={Your secret access key} \
--build-arg elasticsearch_version=7.17.3 \
.

This command builds the Docker image with the tag elasticsearch:7.17.3 using the build arguments

ENV_VAR_AWS_ACCESS_KEY_ID and ENV_VAR_AWS_SECRET_ACCESS_KEY that contain your AWS access key and secret access key respectively. The Elasticsearch version is also specified as a build argument and set to 7.17.3. The “.” at the end of the command specifies the build context, which is the current directory.

If you do not have Elasticsearch and Kibana deployed in Kubernetes, you can use the official Helm charts from https://artifacthub.io to deploy them. In the values.yaml file for Elasticsearch, you need to specify the Docker image that you have built. If you have built the image locally, you will need to push it to an AWS ECR repository, for example, so that it can be downloaded from there when you deploy the Helm chart.

We strongly recommend deploying Kibana because it provides a user-friendly UI that is very helpful for managing Elasticsearch. With Kibana, you can easily visualize and analyze data in Elasticsearch, create and share dashboards, and perform various administrative tasks.

After everything is set up properly, let’s dive into Kibana Management.

The S3 plugin is enabled successfully

-To start making snapshots, we first need to set up the S3 snapshot repository. To do this, we need to assign a repository name and then specify the following attributes on the next page:

  • Bucket name: elasticsearch-snapshots-s3-integration
  • Base path: This can be used to specify sub-folders if you have multiple Elasticsearch clusters and you want to have a central location for all snapshots.
set up the S3 snapshot repository
  • There are also other attributes that you can adjust based on your workflow. I will skip them.
  • Click Register

-After setting up the S3 repository, it’s important to validate that it has a connection to S3. You can do this by clicking on the snapshot repository and verifying it.

S3 repository validates that it has a connection to S3

Backup procedure

Once we have established the S3 repository correctly, the next step is to establish an automated method of generating snapshots. This can be accomplished through the use of Policies.

snapshot and restore create first snapshot policy
create policy logistics  snapshots

We need to set the policy name, the snapshot name, the repository, and the schedule. You can even create your own cron expression. 

create policy snapshot settings

You can choose for indices, your all indices, including system and normal indices, and you can choose from only the ones that you would like.

create policy snapshot retention

Choose the snapshot retention for expiration and snapshots to retain.

create policy review

Final view of the policy, then you can validate that the snapshots are created. 

Restore operation

Now, if we find ourselves in a situation where we need to restore data quickly from the latest possible snapshot, we should follow these steps:

  1. In the Snapshots section, find the latest snapshots, created from your policy:
snapshots and restore latest policy

2. You need to click on it and then you will see the following options:

restore logistics

3. You have the option to choose which indices to restore. You can either fully restore everything or choose specific indices to restore. However, it’s important to keep in mind that the indices you want to restore must be closed. This is because no search or index operations can be performed while the restoring process is ongoing.

4. There is a built-in console for executing different commands:

elastic console
  • GET /_cat/indices -> list all available indices
  • POST /my-index-000001/_close – close the indices that you would like to be restored

5. After successful restoration, they will be opened automatically. 

In Elasticsearch, this use case only applies if you’re performing a restore procedure on the same cluster. If you want to restore indices to another cluster, you don’t need to perform the close operation, since the indices won’t exist there.

Moreover, one of the benefits of using an S3 repository is that when you register it into another Elasticsearch cluster, the available snapshots are automatically populated. This makes it easy to quickly restore a snapshot on the new cluster.

During our exploration of Elasticsearch backup and restore procedures, we learned the manual approach for performing these operations. However, for advanced use cases where a prompt recovery is critical, automating the backup and restore process is highly recommended.

Conclusion

In conclusion, having a solid backup and restore strategy for your Elasticsearch data is essential for maintaining uptime, data integrity, and business continuity. At ITGix, we understand the importance of reliable Elasticsearch restore procedures, and our DevOps consulting services company is here to help you design, implement, and maintain the right solutions for your organization’s needs.

Leave a Reply

Your email address will not be published. Required fields are marked *

More Posts

This guide will walk you through deploying multiple AWS Lambda functions using Pulumi, an infrastructure as code tool that allows you to define and manage cloud resources using familiar programming...
Reading
Day 3 of Kubecon Day 3 of KubeCon + CloudNativeCon in Paris was a spring of fresh ideas, sustainability discussions, and ever-evolving cloud-native technologies. Here’s a deep dive into the...
Reading
Get In Touch
ITGix provides you with expert consultancy and tailored DevOps services to accelerate your business growth.
Newsletter for
Tech Experts
Join 12,000+ business leaders and engineers who receive blogs, e-Books, and case studies on emerging technology.