Source files
You can find all source files for this backup-restore solution in our official repository and you can download, modify, and use them for non-commercial purposes.
Find them here.
Introduction to Amazon Cognito
Amazon Cognito allows you easily add user sign-up, sign-in, and access control to your mobile and web applications. Amazon Cognito scales to millions of users and it’s highly accessible as it supports sign-in with social identity providers, such as Facebook, Apple, Google, Amazon, and enterprise identity providers via OpenID Connect and SAML 2.0.
However, if you need to backup an Amazon Cognito User Pool, you will notice that there is no native solution provided by Amazon. Luckily, Amazon User Pool has a very flexible API that we are going to use in order to back up our users.
It’s crucial to have all user profiles and linked data backed up, as it plays a vital role in preventing accidents (e.g. unintentionally losing data). Moreover, they are essential in other cases such as migrating user data to a new user pool. For these reasons, today I’ll share with you my custom-built solution.
Custom-built backup restore solution for Amazon Cognito users
Managing data is a tough job, that’s when backups come to the rescue and prevent it from causing trouble. But how to ensure backups when the service you rely on does not provide a backup solution? The answer is pretty simple, build one yourself! And this is what we are going to do in this tutorial.
This solution is custom-built but designed to match a high variety of use cases. Let’s get into our step-by-step guide explaining how to develop a backup-restore solution for Amazon Cognito users.
This tutorial will cover the core part of the implementation rather than the entire architecture setup. For more information please refer to the AWS documentation.
I will not show how to associate AWS Lambda to a trigger event. In our case, we will trigger events manually.
To accomplish our goal we will use Python-based Lambda functions. One for exporting Cognito users into CSV, uploading, and storing it into an S3 bucket. One to clean the second Cognito, and one to get the CSV file and import it into the backup Cognito instance:
The script uses Python AWS SDK which is called boto3 in order to perform ListUsers API calls. Using boto3 library you need to use the list_users(**args) method from CognitoIdentityProvider. For more methods and details visit the official documentation page.
Deploy Export Lambda
Create a Role. With this role, we will give the Lambda function some permissions:
- AmazonS3FullAccess – to upload and store the CSV file into S3 Bucket
- AmazonCognitoReadOnly – to list and read user data
!!! Remember, use the least permission possible, when you are deploying in production. For the purpose of this tutorial we are using the inbuilt permission policies provided by AWS. They are providing full access to resources. It is always a good practice to create a custom policy with the only permissions the Lambda’s needs.
Create S3 Bucket
When creating the bucket don’t forget to block public access. As our CSV files will contain private data.
Create Lambda function
Create Lambda function from scratch using Python 3.9 as a runtime:
Add 3 environment variables:
- BACKUP_BUCKET – this will be your destination bucket
- COGNITO_ID – the id of the source Cognito User Pool
- REGION – Cognito region
From Configuration – General Configuration.
Edit the memory and the timeout as the default values may not be enough if you have a large number of users.
Attach the created role:
The last thing to change is the lambda Handler. This is mapping to the file and the main method that will be executed.
Now paste the code. Save and deploy.
from email import header
import boto3
import datetime
from datetime import datetime
import time
import argparse
import traceback
import os
class Logs:
@staticmethod
def warning(logBody):
print("[WARNING] {}".format(logBody))
@staticmethod
def critical(logBody):
print("[CRITICAL] {}".format(logBody))
@staticmethod
def info(logBody):
print("[INFO] {}".format(logBody))
class Cognito:
USERPOOLID = ""
REGION = ""
ATTRIBUTES = ""
# INIT userpool id, region and column names to be exported.
# Currently region is not used. The export lambda must be in the same region as the Cognito service
def __init__(self, userPoolId, region, attributes):
self.USERPOOLID = userPoolId
self.REGION = region
self.ATTRIBUTES = attributes
def getAttributes (self):
try:
boto = boto3.client('cognito-idp')
headers = boto.get_csv_header(
UserPoolId=self.USERPOOLID
)
self.ATTRIBUTES = headers["CSVHeader"]
return headers["CSVHeader"]
except Exception as e:
Logs.critical("There is an error listing users attributes")
Logs.critical(traceback.format_exc())
exit()
# List all cognito users with only the predefined columns in ATTRIBUTES variable
# As there is a limit of 60 users per query, a multiple queries are executed separated in so called pages.
def listUsers (self):
try:
boto = boto3.client('cognito-idp')
users = []
next_page = None
kwargs = {
'UserPoolId': self.USERPOOLID,
}
users_remain = True
while(users_remain):
if next_page:
kwargs['PaginationToken'] = next_page
response = boto.list_users(**kwargs)
users.extend(response['Users'])
next_page = response.get('PaginationToken', None)
users_remain = next_page is not None
# COOL DOWN BEFORE NEXT QUERY
time.sleep(0.15)
return users
except Exception as e:
Logs.critical("There is an error listing cognito users")
Logs.critical(traceback.format_exc())
exit()
# List all cognito groups with only the predefined columns in ATTRIBUTES variable
# As there is a limit of 60 groups per query, a multiple queries are executed separated in so called pages.
def listGroups (self):
try:
boto = boto3.client('cognito-idp')
groups = []
next_page = None
kwargs = {
'UserPoolId': self.USERPOOLID,
}
groups_remain = True
while(groups_remain):
if next_page:
kwargs['NextToken'] = next_page
response = boto.list_groups(**kwargs)
groups.extend(response['Groups'])
next_page = response.get('NextToken', None)
groups_remain = next_page is not None
# COOL DOWN BEFORE NEXT QUERY
time.sleep(0.15)
return groups
except Exception as e:
Logs.critical("There is an error listing cognito groups")
Logs.critical(traceback.format_exc())
exit()
class CSV:
FILENAME = ""
FOLDER = "/tmp/"
ATTRIBUTES = ""
CSV_LINES = []
# INIT titles and filename
def __init__(self, attributes, prefix):
self.ATTRIBUTES = attributes
self.FILENAME = "cognito_backup_" + prefix + "_" + datetime.now().strftime("%Y%m%d-%H%M") + ".csv"
self.CSV_LINES = []
# Generate CSV content. Every column in a row is split with ","
# First are added the titles and then all users are looped.
def generateUserContent (self, records):
try:
#ADD TITLES
csv_new_line = self.addTitles()
#ADD USERS
for user in records:
""" Fetch Required Attributes Provided """
csv_line = csv_new_line.copy()
for requ_attr in self.ATTRIBUTES:
csv_line[requ_attr] = ''
if requ_attr in user.keys():
csv_line[requ_attr] = str(user[requ_attr])
continue
for usr_attr in user['Attributes']:
if usr_attr['Name'] == requ_attr:
csv_line[requ_attr] = str(usr_attr['Value'])
csv_line["cognito:mfa_enabled"] = "false"
csv_line["cognito:username"] = csv_line["email"]
self.CSV_LINES.append(",".join(csv_line.values()) + 'n')
return self.CSV_LINES
except Exception as e:
Logs.critical("Error generating csv content")
Logs.critical(traceback.format_exc())
exit()
# Add titles to first row and return it as a template.
def addTitles (self):
csv_new_line = {self.ATTRIBUTES[i]: '' for i in range(len(self.ATTRIBUTES))}
self.CSV_LINES.append(",".join(csv_new_line) + 'n')
return csv_new_line
# Generate CSV content. Every column in a row is split with ","
# First are added the titles and then all groups are looped.
def generateGroupContent (self, records):
try:
#ADD TITLES
csv_new_line = self.addTitles()
#ADD GROUPS
for group in records:
csv_line = {}
for groupParam in self.ATTRIBUTES:
csv_line[str(groupParam)] = str(group[str(groupParam)])
self.CSV_LINES.append(",".join(csv_line.values()) + 'n')
return self.CSV_LINES
except Exception as e:
Logs.critical("Error generating csv content")
Logs.critical(traceback.format_exc())
exit()
# Save generated content to a file.
def saveToFile(self):
try:
csvFile = open(self.FOLDER + "/" + self.FILENAME, 'a')
csvFile.writelines(self.CSV_LINES)
csvFile.close()
except Exception as e:
Logs.critical("Error saving csv file")
Logs.critical(traceback.format_exc())
exit()
class S3:
BUCKET = ""
REGION = ""
# INIT bucket name and region. Currently region is not used.
def __init__(self, bucket, region):
self.BUCKET = bucket
self.REGION = region
# Upload fil to s3 bucket
def uploadFile(self, src, dest):
try:
boto3.resource('s3').meta.client.upload_file(src, self.BUCKET, dest)
except Exception as e:
Logs.critical("Error uploading the backup file")
Logs.critical(traceback.format_exc())
exit()
def lambda_function(event, context):
### MAIN ###
# VARIABLES
REGION = os.environ['REGION']
COGNITO_ID = os.environ['COGNITO_ID']
BACKUP_BUCKET = os.environ['BACKUP_BUCKET']
GATTRIBUTES = [
'GroupName',
'Description',
'Precedence'
]
# INIT CLASSES
cognito = Cognito(COGNITO_ID, REGION, [])
cognitoS3 = S3(BACKUP_BUCKET, REGION)
# GET USERS ITTRIBUTES AND INIT CSV CLASS
ATTRIBUTES = cognito.getAttributes()
csvUsers = CSV(ATTRIBUTES, "users")
# LIST ALL USERS
user_records = cognito.listUsers()
# SAVE USERS TO FILE
csvUsers.generateUserContent(user_records)
csvUsers.saveToFile()
# DISPLAY INFO
Logs.info("Total Exported User Records: "+str(len(csvUsers.CSV_LINES)))
# UPLOAD FILE
cognitoS3.uploadFile (csvUsers.FOLDER + "/" + csvUsers.FILENAME, csvUsers.FILENAME)
# INIT GROUPS CSV CLASS
csvGroups = CSV(GATTRIBUTES, "groups")
# LIST ALL GROUPS
group_records = cognito.listGroups()
# SAVE GROUPS TO FILE
csvGroups.generateGroupContent(group_records)
csvGroups.saveToFile()
# DISPLAY INFO
Logs.info("Total Exported Group Records: "+str(len(csvGroups.CSV_LINES)))
# UPLOAD FILE
cognitoS3.uploadFile (csvGroups.FOLDER + "/" + csvGroups.FILENAME, csvGroups.FILENAME)
From the test tab just press the “TEST” button and you will see the successful output of the run.
So what does this Lambda do?
There are a couple of custom classes to work with Cognito, S3, and CSV. But the main procedure is inside the lambda_function method.
- First, we are initializing all classes.
- Inside GATTRIBUTES we must define the group columns that are going to be exported.
- GroupName
- Description
- Precedence
These are the default columns and will be enough in most cases.
- Get user attributes. These are the column names used in Cognito. In this installation, we have 3 custom attributes.
!!! When using custom attributes, we have to create them inside the new Cognito, or the import will fail.
- List all users
- Generate and save users inside the CSV file
- Upload the newly created users file to our backup Bucket
- List all groups
- Generate and save groups inside the CSV file.
- Upload the newly created groups file to our backup Bucket
- Export some statistic
If we visit the backup bucket, we will see the newly created CSV files:
Clean Cognito Lambda
In case our second Cognito pool is not empty, or the import is not fully successful, we will need to clean it.
Again, create a Python 3.9 Lambda function from scratch. Change memory and timeout and add these environment variables:
- COGNITO_ID – with the ID of destination Cognito
- REGION – destination Cognito Region
Also, create a new role with the following permission, and call it CognitoImportRole:
- IAMFullAccess
- AmazonS3FullAccess
- CloudWatchFullAccess
- AmazonCognitoPowerUser
import boto3
import datetime
from datetime import datetime
import time
import traceback
import os
class Logs:
@staticmethod
def warning(logBody):
print("[WARNING] {}".format(logBody))
@staticmethod
def critical(logBody):
print("[CRITICAL] {}".format(logBody))
@staticmethod
def info(logBody):
print("[INFO] {}".format(logBody))
class Cognito:
USERPOOLID = ""
REGION = ""
ATTRIBUTES = ""
# INIT userpool id, region and column names to be exported.
# Currently region is not used. The export lambda must be in the same region as the Cognito service
def __init__(self, userPoolId, region, attributes):
self.USERPOOLID = userPoolId
self.REGION = region
self.ATTRIBUTES = attributes
def getAttributes (self):
try:
boto = boto3.client('cognito-idp')
headers = boto.get_csv_header(
UserPoolId=self.USERPOOLID
)
self.ATTRIBUTES = headers["CSVHeader"]
return headers["CSVHeader"]
except Exception as e:
Logs.critical("There is an error listing users attributes")
Logs.critical(traceback.format_exc())
exit()
# List all cognito users with only the predefined columns in ATTRIBUTES variable
# As there is a limit of 60 users per query, a multiple queries are executed separated in so called pages.
def listUsers (self):
try:
boto = boto3.client('cognito-idp')
users = []
next_page = None
kwargs = {
'UserPoolId': self.USERPOOLID,
}
users_remain = True
while(users_remain):
if next_page:
kwargs['PaginationToken'] = next_page
response = boto.list_users(**kwargs)
users.extend(response['Users'])
next_page = response.get('PaginationToken', None)
users_remain = next_page is not None
# COOL DOWN BEFORE NEXT QUERY
time.sleep(0.15)
return users
except Exception as e:
Logs.critical("There is an error listing cognito users")
Logs.critical(traceback.format_exc())
exit()
# List all cognito groups with only the predefined columns in ATTRIBUTES variable
# As there is a limit of 60 groups per query, a multiple queries are executed separated in so called pages.
def listGroups (self):
try:
boto = boto3.client('cognito-idp')
groups = []
next_page = None
kwargs = {
'UserPoolId': self.USERPOOLID,
}
groups_remain = True
while(groups_remain):
if next_page:
kwargs['NextToken'] = next_page
response = boto.list_groups(**kwargs)
groups.extend(response['Groups'])
next_page = response.get('NextToken', None)
groups_remain = next_page is not None
# COOL DOWN BEFORE NEXT QUERY
time.sleep(0.15)
return groups
except Exception as e:
Logs.critical("There is an error listing cognito groups")
Logs.critical(traceback.format_exc())
exit()
def deleteGroups (self, groups):
try:
boto = boto3.client('cognito-idp')
for group in groups:
response = boto.delete_group(
GroupName=group["GroupName"],
UserPoolId=self.USERPOOLID
)
except Exception as e:
Logs.critical("There is an error listing cognito groups")
Logs.critical(traceback.format_exc())
exit()
def deleteUsers (self, users):
try:
boto = boto3.client('cognito-idp')
for user in users:
response = boto.admin_delete_user(
UserPoolId=self.USERPOOLID,
Username=user["Username"]
)
except Exception as e:
Logs.critical("There is an error listing cognito groups")
Logs.critical(traceback.format_exc())
exit()
def lambda_function(event, context):
### MAIN ###
# VARIABLES
REGION = os.environ['REGION']
COGNITO_ID = os.environ['COGNITO_ID']
# INIT CLASSES
cognito = Cognito(COGNITO_ID, REGION, [])
# LIST ALL USERS
user_records = cognito.listUsers()
cognito.deleteUsers(user_records)
# LIST ALL GROUPS
group_records = cognito.listGroups()
cognito.deleteGroups(group_records)
So what does this Lambda do? The procedure is pretty simple. It lists all users and groups and deletes them.
Restore Cognito User Pool
The following steps are important only whenever we need to restore the users’ data stored on S3 — for example when a user was accidentally deleted or when data needs to be migrated to another Cognito User Pool. In order to do this, we created a second AWS Lambda that needs to be triggered manually from the AWS console in case one wants to restore the respective data. That can be a new user pool or an old one. For this generic solution, all the S3 users saved in the bucket will be considered, but Cognito will import only the users that do not exist in the UserPool and fail for the existing ones.
The Restore solution is presented below.
Again, create a Python 3.9 Lambda function from scratch. Change memory and timeout, attach the CognitoImport role add these environment variables:
- BACKUP_BUCKET – backup bucket
- BACKUP_FILE_GROUPS – full filename cognito_backup_groups_XXXXXXXX-XXXX.csv
- BACKUP_FILE_USERS – full filename cognito_backup_users_XXXXXXXX-XXXX.csv
- COGNITO_ID – destination cognito id
- REGION – destination cognito region
The source code here is a little bit different. As AWS does not support the “requests” library, we must provide it.
- Save the code below to index.py
- In the same folder run
pip3 install requests -t .
This will download all packages needed to run “requests”.
import boto3
import datetime
from datetime import datetime
import time
import traceback
import csv
import requests
import os
class Logs:
@staticmethod
def warning(logBody):
print("[WARNING] {}".format(logBody))
@staticmethod
def critical(logBody):
print("[CRITICAL] {}".format(logBody))
@staticmethod
def info(logBody):
print("[INFO] {}".format(logBody))
class S3:
BUCKET = ""
REGION = ""
def __init__(self, bucket, region):
self.BUCKET = bucket
self.REGION = region
def downloadFile (self, src, dest):
try:
boto3.resource('s3').meta.client.download_file(self.BUCKET, src, dest)
except Exception as e:
Logs.critical("Error downloading file")
Logs.critical(traceback.format_exc())
exit()
class CSV:
FILENAME = ""
FOLDER = "/tmp/"
def __init__(self, filename):
self.FILENAME = filename
def readBackup(self):
row_count = 0
groups = []
import csv
with open(self.FILENAME, 'r') as file:
csv_file = csv.DictReader(file)
for row in csv_file:
groups.append(dict(row))
return groups
class Cognito:
USERPOOLID = ""
REGION = ""
ATTRIBUTES = ""
def __init__(self, userPoolId, region, attributes):
self.USERPOOLID = userPoolId
self.REGION = region
self.ATTRIBUTES = attributes
def importUsers (self):
try:
print("importUsers")
except Exception as e:
Logs.critical("Error importing users")
Logs.critical(traceback.format_exc())
exit()
def importGroups (self, groups):
try:
boto = boto3.client('cognito-idp')
for group in groups:
print(group)
if not self.checkIfGroupExists(group["GroupName"]):
kwargs = {
'UserPoolId': self.USERPOOLID
}
for attribute in self.ATTRIBUTES:
if (group[str(attribute)].isnumeric()):
kwargs[str(attribute)] = int(group[attribute])
else:
kwargs[str(attribute)] = str(group[attribute])
response = boto.create_group(**kwargs)
else:
Logs.info("Group {} already exists".format(group["GroupName"]))
except Exception as e:
Logs.critical("Error importing groups")
Logs.critical(traceback.format_exc())
exit()
def checkIfGroupExists(self, groupName):
try:
boto = boto3.client('cognito-idp')
response = boto.get_group(
GroupName=groupName,
UserPoolId=self.USERPOOLID
)
return True
except Exception as e:
return False
def importUsers(self, filename):
try:
#client = boto3.client("cognito-idp", region_name="ap-south-1")
boto = boto3.client('cognito-idp')
response = boto.get_csv_header(
UserPoolId=self.USERPOOLID
)
response = boto.create_user_import_job(
JobName='Import-Test-Job',
UserPoolId=self.USERPOOLID,
CloudWatchLogsRoleArn='arn:aws:iam::615124646879:role/CognitoImportRole'
)
#print(response)
#UPLOAD CSV File
content_deposition = 'attachment;filename='+filename;
presigned_url = response['UserImportJob']['PreSignedUrl']
print(presigned_url)
headers_dict = {
'x-amz-server-side-encryption': 'aws:kms',
'Content-Disposition': content_deposition
}
with open(filename, 'rb') as csvFile:
file_upload_response = requests.put(
presigned_url,
data=csvFile,
headers=headers_dict
)
response2 = boto.start_user_import_job(
UserPoolId=self.USERPOOLID,
JobId=response["UserImportJob"]["JobId"]
)
print(response2)
except Exception as e:
Logs.critical("Error importing users")
Logs.critical(traceback.format_exc())
exit()
def lambda_function(event, context):
REGION = os.environ['REGION']
COGNITO_ID = os.environ['COGNITO_ID']
BACKUP_FILE_USERS = os.environ['BACKUP_FILE_USERS']
BACKUP_FILE_GROUPS = os.environ['BACKUP_FILE_GROUPS']
BACKUP_BUCKET = os.environ['BACKUP_BUCKET']
cognitS3 = S3(BACKUP_BUCKET, REGION)
# DOWNLOAD GROUPS
cognitS3.downloadFile(BACKUP_FILE_GROUPS, "/tmp/"+BACKUP_FILE_GROUPS)
# IMPORT GROUPS
csvGroups = CSV("/tmp/"+BACKUP_FILE_GROUPS)
groups = csvGroups.readBackup()
GATTRIBUTES = [
'GroupName',
'Description',
'Precedence'
]
cognito = Cognito(COGNITO_ID, REGION, GATTRIBUTES)
cognito.importGroups(groups)
# DOWNLOAD USERS
cognitS3.downloadFile(BACKUP_FILE_USERS, "/tmp/"+BACKUP_FILE_USERS)
#csvUsers = CSV("/tmp/"+BACKUP_FILE_USERS)
#users = csvUsers.readBackup()
ATTRIBUTES = [
'email',
'given_name',
'family_name'
]
cognitoUsers = Cognito(COGNITO_ID, REGION, ATTRIBUTES)
cognitoUsers.importUsers("/tmp/"+BACKUP_FILE_USERS)
# IMPORT USERS
Now zip the entire directory and deploy the code from the zip file.
What does the Lambda do?
- Downloads groups file.
- Import groups
- Download users file
- Creates a Cognito Import job, attaches a file, and runs it.
As the import job is asynchronous the Lambda stops here and is considered as successful.
We can track import jobs from Cognito User Pool – Users and Groups – Import Users
Here we see the number of successfully imported users. For some reason, a user can not be imported.
Observation of Backup and Restore Solution with Python3 Lambda
Unfortunately, there are some limitations with exporting/importing users from a user pool. Some are fairly simple, others not. Below we list the ones that have a greater impact on the solution:
- Password backups: There are no backups for passwords and users would need to reset them.
- Cognito sub-attributes renewal: If the software relies on them, they can be copied to a custom attribute, or alternatively, custom attributes can be used in place of sub-attributes on the software solution.
- MFA – all MFA would be disabled in the newly created pool and must be re-enabled.
- Third-party providers such as Facebook will not work.
!!!The problem is that every Cognito pool has its unique salt key. It is automatically generated and we don’t have the ability to copy or change it.
If you have any questions, feel free to contact us.