Why etcd Backups are Critical
In modern distributed systems, etcd plays a crucial role as a reliable and fast key-value store that serves as the backbone for storing critical configuration and state data. From Kubernetes to other large-scale systems, etcd often acts as the “heart” that ensures clusters operate smoothly.
But what happens if this vital database is compromised, deleted, or corrupted? Data loss in etcd can lead to severe disruptions, loss of state, or even complete service outages. This is why proper backup management and recovery are essential for administrators and engineers alike.
Storing etcd Backups in the S3 bucket
To ensure your etcd backups are secure and accessible, storing them in an S3 bucket is a reliable option. S3 provides durability, availability, and the ability to automate backup uploads.
Set Up AWS CLI
Install and configure the AWS CLI by following this guide
Create an S3 Bucket
Create a bucket to store your backups. You can do this via the AWS Management Console or CLI
Upload Backups to S3
After creating a backup with etcdctl, upload it to the bucket:
Automate the Process
Use a cron job or a script to automate regular backups and uploads to S3

How to Create an etcd Backup
Set Source
Load the environment variables for etcd:
* source /etc/etcd.env
Extract Endpoints for etcd Backup
Extract the endpoints of the etcd cluster
ETCD_ENDPOINTS_FOR_BACKUP=$(ETCDCTL_API=3 etcdctl member list --endpoints $ETCDCTL_ENDPOINTS --cacert $ETCD_TRUSTED_CA_FILE --cert $ETCD_CERT_FILE --key $ETCD_KEY_FILE | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',')
echo "Member list is $ETCD_ENDPOINTS_FOR_BACKUP"
Check the Status of etcd
Verify the status of the etcd endpoints
ETCDCTL_API=3 etcdctl endpoint status --endpoints $ETCD_ENDPOINTS_FOR_BACKUP --cacert $ETCD_TRUSTED_CA_FILE --cert $ETCD_CERT_FILE --key $ETCD_KEY_FILE
Create snapshot
To create a snapshot of the etcd database, use the following command:
ETCDCTL_API=3 etcdctl --endpoints="$ETCDCTL_ENDPOINTS" \
--cacert="$ETCD_TRUSTED_CA_FILE" \
--cert="$ETCD_CERT_FILE" \
--key="$ETCD_KEY_FILE" \
snapshot save $BACKUP_DIR/$PREFIX-$TIMESTAMP.db
Explanation of the Command:
- –endpoints=”$ETCD_ENDPOINTS”: Specifies the endpoints of the etcd cluster
- –cacert=”$ETCD_TRUSTED_CA_FILE”: Path to the trusted CA certificate for secure communication
- –cert=”$ETCD_CERT_FILE”: Path to the client certificate
- –key=”$ETCD_KEY_FILE”: Path to the client’s private key
- snapshot save: Saves the current state of the etcd database as a snapshot
- $BACKUP_DIR/$PREFIX-$TIMESTAMP.db: Specifies the location and naming format for the snapshot file
If your environment variables are set as follows:
BACKUP_DIR=/var/backups/etcd
PREFIX=etcd-backup
TIMESTAMP=$(date +%Y%m%d%H%M%S)
The snapshot will be saved as:
/var/backups/etcd/etcd-backup-20250115123045.db
Upload snapshot to S3
After creating the snapshot, upload it to S3 bucket:
aws s3 cp /$BACKUP_DIR/$PREFIX-$TIMESTAMP.db s3://$S3_BUCKET/$ETCD_PREFIX_ENV/$ETCD_PREFIX_ENV_FOR_SNAPSHOTS/$PREFIX-$TIMESTAMP.db
Restore procedure copy the snapshot from S3
To restore from S3, download the snapshot to the local server
aws s3 cp --profile etcd-backup-restore-s3 s3://$ETCD_S3_BUCKET/$ETCD_PREFIX_ENV/$ETCD_PREFIX_ENV_FOR_SNAPSHOTS/$ETCD_SNAPSHOT etcd-snapshot.db
Set Source
Load the environment variables for etcd:
* source /etc/etcd.env
Additional Considerations for Restoration
When restoring etcd, the following steps should also be considered:
Stop the etcd Service
Before restoring the snapshot, it’s crucial to stop the etcd service to avoid conflicts during the restore process.
systemctl stop etcd
Rename the Existing Data Directory
It’s a good practice to rename the existing etcd data directory before restoring to avoid any potential data corruption.
mv /var/lib/etcd /var/lib/etcd.copy_$(date +’%Y-%m-%d_%H-%M-%S’)
Restore etcd from downloaded snapshot
Once the old data is safely renamed, restore the snapshot into the etcd data directory:
ETCDCTL_API=3 etcdctl \
--data-dir="/var/lib/etcd" \
snapshot restore --skip-hash-check=true "$ETCD_SNAPSHOT" \
--name="$ETCD_NAME" \
--initial-cluster="$ETCD_INITIAL_CLUSTER" \
--initial-advertise-peer-urls="$ETCD_INITIAL_ADVERTISE_PEER_URLS" \
--initial-cluster-token="$ETCD_INITIAL_CLUSTER_TOKEN"
Start the etcd Service
After the snapshot is restored, restart the etcd service to apply the changes:
systemctl start etcd
Check the status of the etcd cluster if you prefer
If you prefer, you can check the status of the etcd cluster after restoration:
etcdctl endpoint status --write-out=table \
--endpoints "$ETCDCTL_ENDPOINTS" \
--cacert="$ETCD_TRUSTED_CA_FILE" \
--cert="$ETCD_CERT_FILE" \
--key="$ETCD_KEY_FILE"
The etcd leader should be one.
Conclusion
Following these steps, you can successfully backup and restore your etcd service, ensuring the safety and availability of your critical data. Regular backups and a reliable restore procedure are key to maintaining the stability of your distributed systems.
Best Practices for Backup and Recovery
For further automation, consider using Kubernetes Jobs to schedule and manage your etcd backups. This approach allows you to automate the backup process within your Kubernetes environment, ensuring that backups are performed regularly without manual intervention.
For the restore process, you can leverage Ansible roles to streamline and automate the recovery procedure. Using Ansible, you can define a set of tasks for restoring etcd from a snapshot, making the process more efficient and repeatable across different environments.
By automating both backup and restore procedures, you reduce the risk of human error and ensure a more reliable and consistent approach to managing your etcd service.Test Backup and Restore Regularly: It’s essential to test your backup and restore procedures regularly to ensure they work as expected during an actual disaster recovery scenario. By performing periodic tests, you can identify potential issues before they become critical.