As an experienced DevOps consulting services company, we understand the importance of Elasticsearch in powering mission-critical applications and services. Users may store, search, and analyze massive amounts of data in real time using Elasticsearch, an open-source distributed search and analytics engine. Built on top of the Lucene search engine library, Elasticsearch is designed to be scalable, resilient, and highly available.
However, with increased reliance on Elasticsearch, it’s essential to have robust backup and restore procedures in place. In a DevOps context, where continuous delivery and continuous deployment are key principles, having reliable backups of Elasticsearch data is critical for efficient recovery from data loss or system failure. This helps minimize downtime and data loss while contributing to faster recovery and improved business continuity.
Today’s tutorial will demonstrate how to back up and restore your Elasticsearch using AWS S3 as the snapshot repository. However, Elasticsearch does not come with the S3 option enabled by default. In order to allow for this feature, we need to install the S3 plugin directly into the Elasticsearch Docker image and then deploy it in Kubernetes using the official Elasticsearch helm chart.
Prerequisites
Before building the Dockerfile, there are some prerequisites that we need to fulfill. We require an AWS IAM User with Access Keys that can be injected as environment variables into the docker image. It is important to follow the best practices for IAM and set the least-privilege permissions, assigning only the S3 Permissions for the S3 bucket where you intend to store the snapshots. You also need to have an S3 bucket set up.
This is an example of what our Dockerfile will look like:
# Set the Elasticsearch version as a build argument
ARG elasticsearch_version
# Use the official Elasticsearch Docker image as the base
FROM docker.elastic.co/elasticsearch/elasticsearch:${elasticsearch_version}
# Define the AWS access key ID and secret access key as build arguments
ARG ENV_VAR_AWS_ACCESS_KEY_ID
ARG ENV_VAR_AWS_SECRET_ACCESS_KEY
# Set the AWS access key ID and secret access key as environment variables
ENV AWS_ACCESS_KEY_ID=${ENV_VAR_AWS_ACCESS_KEY_ID}
ENV AWS_SECRET_ACCESS_KEY=${ENV_VAR_AWS_SECRET_ACCESS_KEY}
# Install the S3 repository plugin and create a new Elasticsearch keystore
RUN bin/elasticsearch-plugin install --batch repository-s3
RUN /usr/share/elasticsearch/bin/elasticsearch-keystore create
# Add the AWS access key ID and secret access key to the Elasticsearch keystore
RUN echo $AWS_ACCESS_KEY_ID | /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin s3.client.default.access_key
RUN echo $AWS_SECRET_ACCESS_KEY | /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin s3.client.default.secret_key
This Dockerfile sets the Elasticsearch version as a build argument and uses the official Elasticsearch Docker image as the base. It then defines the AWS access key ID and secret access key as build arguments and sets them as environment variables. The S3 repository plugin is installed and a new Elasticsearch keystore is created. Finally, the AWS access key ID and secret access key are added to the Elasticsearch keystore.
To build the Docker image from the Dockerfile, run the following command:
docker build -t elasticsearch:7.17.3 \
--build-arg ENV_VAR_AWS_ACCESS_KEY_ID={Your access key} \
--build-arg ENV_VAR_AWS_SECRET_ACCESS_KEY={Your secret access key} \
--build-arg elasticsearch_version=7.17.3 \
.
This command builds the Docker image with the tag elasticsearch:7.17.3 using the build arguments
ENV_VAR_AWS_ACCESS_KEY_ID and ENV_VAR_AWS_SECRET_ACCESS_KEY that contain your AWS access key and secret access key respectively. The Elasticsearch version is also specified as a build argument and set to 7.17.3. The “.” at the end of the command specifies the build context, which is the current directory.
If you do not have Elasticsearch and Kibana deployed in Kubernetes, you can use the official Helm charts from https://artifacthub.io to deploy them. In the values.yaml file for Elasticsearch, you need to specify the Docker image that you have built. If you have built the image locally, you will need to push it to an AWS ECR repository, for example, so that it can be downloaded from there when you deploy the Helm chart.
We strongly recommend deploying Kibana because it provides a user-friendly UI that is very helpful for managing Elasticsearch. With Kibana, you can easily visualize and analyze data in Elasticsearch, create and share dashboards, and perform various administrative tasks.
After everything is set up properly, let’s dive into Kibana Management.
-To start making snapshots, we first need to set up the S3 snapshot repository. To do this, we need to assign a repository name and then specify the following attributes on the next page:
- Bucket name: elasticsearch-snapshots-s3-integration
- Base path: This can be used to specify sub-folders if you have multiple Elasticsearch clusters and you want to have a central location for all snapshots.
- There are also other attributes that you can adjust based on your workflow. I will skip them.
- Click Register
-After setting up the S3 repository, it’s important to validate that it has a connection to S3. You can do this by clicking on the snapshot repository and verifying it.
Backup procedure
Once we have established the S3 repository correctly, the next step is to establish an automated method of generating snapshots. This can be accomplished through the use of Policies.
We need to set the policy name, the snapshot name, the repository, and the schedule. You can even create your own cron expression.
You can choose for indices, your all indices, including system and normal indices, and you can choose from only the ones that you would like.
Choose the snapshot retention for expiration and snapshots to retain.
Final view of the policy, then you can validate that the snapshots are created.
Restore operation
Now, if we find ourselves in a situation where we need to restore data quickly from the latest possible snapshot, we should follow these steps:
- In the Snapshots section, find the latest snapshots, created from your policy:
2. You need to click on it and then you will see the following options:
3. You have the option to choose which indices to restore. You can either fully restore everything or choose specific indices to restore. However, it’s important to keep in mind that the indices you want to restore must be closed. This is because no search or index operations can be performed while the restoring process is ongoing.
4. There is a built-in console for executing different commands:
- GET /_cat/indices -> list all available indices
- POST /my-index-000001/_close – close the indices that you would like to be restored
5. After successful restoration, they will be opened automatically.
In Elasticsearch, this use case only applies if you’re performing a restore procedure on the same cluster. If you want to restore indices to another cluster, you don’t need to perform the close operation, since the indices won’t exist there.
Moreover, one of the benefits of using an S3 repository is that when you register it into another Elasticsearch cluster, the available snapshots are automatically populated. This makes it easy to quickly restore a snapshot on the new cluster.
During our exploration of Elasticsearch backup and restore procedures, we learned the manual approach for performing these operations. However, for advanced use cases where a prompt recovery is critical, automating the backup and restore process is highly recommended.
Conclusion
In conclusion, having a solid backup and restore strategy for your Elasticsearch data is essential for maintaining uptime, data integrity, and business continuity. At ITGix, we understand the importance of reliable Elasticsearch restore procedures, and our DevOps consulting services company is here to help you design, implement, and maintain the right solutions for your organization’s needs.