Cloning RDS Instances for Testing
I have one or more staging/test environments at any given point in time. I want them to each have their own database which should be a nightly clone of the production database.
Previously, my MySQL database was hosted at Rackspace and was backed up using Holland. Holland uses regular mysqldump under the hood. To sync my staging/test databases nightly, I would just run a cron job on each that would download the latest dump, drop its own database and restore from the dump.
In AWS, database backups and cloning are very well automated and tooled. But they are done at a binary/disk level. Generally, this is a good thing, but the cloning process is a tiny bit more involved. To clone your database in AWS, you’ll want to follow these steps:
- Install the AWS command line client
- Create a bash script that does the following…
- Fetch the snapshot id to use for the clone
- Delete the existing instance
- Create the new instance
- Modify the new instance
- Wait for the instance to reach various states
Install the AWS command line client
AWS CLI tools is well documented.
After you install, configure it with aws configure
.
Make sure you set your output format to text
because the example scripts I will provide assume that
format and use tools like awk to parse the output.
Fetch the Snapshot Id
We are going to provide the cli with the instance name and grab the latest snapshot for it. My production
instance is called production
. Replace with the name of your instance.
Delete the existing instance
Your database endpoint is based on the instance name. By giving the new instance the same name as the previous one, your endpoint URI will not change, so you won’t have to reconfigure your consuming app(s). But RDS won’t allow you to have two instances with the same instance identifier, so we must first delete the existing instance.
Alternately, you could rename the existing instance, but it won’t help much. There will be extra steps and you’ll still have downtime. The only way to avoid downtime would be to use a NEW instance name, leaving your old instance alone and reconfigure your app to use the new instance once it is ready.
In my case, I don’t care about downtime. It is just a testing environment.
Again, my staging/test instance is called staging
. Replace with a name of your choosing.
Create the new instance
Creating the new instance is straightforward at this point. Some notable settings:
- No multi AZ so we’re not paying for availability we don’t need
- No minor version upgrade. We’re going to clobber it every night anyway
- Mine is in not in the default VPC, so I must tell it was subnet group it should be in
Choosing your storage settings
Production databases often use Provisioned IOPS. This is a bit expensive for a test environment, but converting between that and other types of storage is slow. This table should help you weigh your options. It assumes your production database is a 100 GB volume with 1000 provisioned IOPS and you’re restoring into a db.t2.micro instance.
Storage Type | Code | Monthly Cost | Migration Time |
---|---|---|---|
Provisioned IOPS | io1 | $77.50 | N/A |
General Purpose (SSD) | gp2 | $10 | 67 mins |
Magnetic | standard | $5 | 87 mins |
Modify the new instance
Some settings are inherited from the snapshot and can’t be changed when you call the restore-db-instance-from-db-snapshot command. You must wait until the instance is created and available and then perform a modification operation. In my case, I needed to disable backups.
Always remember the --apply-immediately
flag or else you’ll be waiting until the next maintenance window
for your changes to take effect.
Wait for the instance to reach various states
When you delete the instance (above), you can’t create the new instance until the delete operation is complete or else the instance identifier will still clash. Here’s a function that will do that for you:
It accepts a single parameter, the instance identifier, and will block until that instance is fully deleted.
Example: wait-until-deleted staging
You also need to wait for the new created instance to be in the available state before you can perform the modification operation on it. In my case, I also wanted to wait until the modification was complete before allowing my script to exit. Here’s a function that will do that for you:
It accepts the instance identifier and the desired status as parameters and blocks until that status is achieved.
Example: wait-for-status staging available
All Together
Here’s my final script that gets run nightly.
Pain Points
The fact that RDS uses snapshotting performs more consistent and reliable backups. The interface, both console and API, is very robust and easy to use. And, if you use a Multi-AZ production database, those snapshots are performed on your fallback instance in a different AZ, so the snapshot don’t cause any performance, latency or locking issues on your DB.
But in the case of creating a clone from those snapshots, there are some downsides:
- Scrubbing your data is a pain. If you want to modify the dump, you must first restore a snapshot as
described above and then you can use
mysqldump
as you normally would, excluding what you need and scrubbing the plain text output as needed. - Most production RDS instances should use provisioned IOPS. Converting from that to SSD or magnetic is slow, but if you don’t then you have to pay a hefty fee. See price table above.
Be thoughtful about going around snapshots by using mysqldump
directly against your production instance.
This will cause downtime. Snapshots based on production Multi-AZ instances are taken using the hot copy,
not the live instances, which means no downtime. If you want to use mysqldump
without downtime, you’ll
need to use a read replica for this purpose, or clone using the process above and then run mysqldump
against that clone.