Automatically Backup MySQL databases to Amazon S3

If you happen to run your own server you need to think about backups.

mysqlFor MySQL to backup a database is as simple as:

mysqldump -u user -ppassword my_database > /var/local/backups/mysql/my_database_dump.sql

Now you have a .sql file, which contains your database, stored into the /var/local/backups/mysql/ folder.

Is your database safe now? Well, no! What if your server dies? What if you loose all the data on it?

Better move the database’s dump somewhere else!

Amazon_S3Amazon S3 is a cloud storage service which ships with a very practical CLI tool.

In order to use it you need an amazon’s account and you need to install and configure the CLI tool.

Now that you have all the tools in place let’s see how we are going to automatically backup mysql databases to amazon s3:

  1. create a bucket on Amazon s3 (say my-backups)
  2. create a new folder inside that bucket (say my-backups/mysql)
  3. back up a database via mysqldump and store the dump into a folder in your server (say /var/local/backups/mysql/)
  4. synchronize that folder to Amazon s3’s my-backups/mysql

Steps 1 and 2 are done just the first time while 3 and 4 should be performed repeatedly (i.e. schedule them).

Here is a small bash script which does the job:


#!/bin/bash
#get the day of week as a number (i.e. 1 = Monday, 7 = Sunday)
DAY_OF_WEEK=$(date +%u)
#backup 'my_database' to the folder /var/local/backups/mysql/
#$DAY_OF_WEEK is appendend to the file's name in order to keep a weekly history of the backups
mysqldump -u user -ppassword my_database > /var/local/backups/mysql/my_database_$DAY_OF_WEEK.sql
#use aws (Amazon's tool) to sync your backups to Amazon S3
#/usr/local/bin/aws – absolute path to Amazon's CLI tool
#s3 – the service we want to use
#sync – the command to run
#/var/local/backups/mysql/ – local folder
#s3://my-backups/mysql/ – Amazon s3 folder
/usr/local/bin/aws s3 sync /var/local/backups/mysql/ s3://my-backups/mysql/

You can save this script into a .sh file, for example backup_mysql.sh, and then schedule every day in order to get an automatic backup. To do so, you need to add an entry to the crontab:

$ crontab -e

then add a line like the following at the bottom

00 00 * * * /bin/sh /path/to/backup_mysql.sh

this will run the script every day at midnight.

If you want to backup more than one database and keep backups separated you could add a parameter to the script:


#!/bin/bash
#get the day of week as a number (i.e. 1 = Monday, 7 = Sunday)
DAY_OF_WEEK=$(date +%u)
#backup the datababase passed as argument($1) to the folder /var/local/backups/mysql/
#$DAY_OF_WEEK is appendend o the file name in order to keep a weekly history of nthe backups
mysqldump -u user -ppassword $1 > /var/local/backups/mysql/$1/dump_$DAY_OF_WEEK.sql
#use aws (Amazon's tool) to sync your backups to Amazon S3
#/usr/local/bin/aws – the tool
#s3 – the service we want to use
#sync – the command to run
#/var/local/backups/mysql/ – local folder
#s3://my-backups/mysql/ – Amazon s3 folder
/usr/local/bin/aws s3 sync /var/local/backups/mysql/$1/ s3://my-backups/mysql/$1/

(pay attention ton the folders structure, it has to be consistent)

now you can pass the database name as a paramenter:

$ sh /path/to/backup_mysql.sh my_database

Testing query results with dbUnit

dbunit-logoWhen developing a data-warehouse you need to run tests on int to ensure that the datasets are consistent.

Call it data quality tests or just tests, they are needed if you want to trust your data.

Here is a simple tutorial on how to run test agains your dataset with dbUnit, by testing query results.

The idea is that you take a snapshot of a well tested resultset and the you test that the query which provides such dataset always return the expected result.

First of all you need a query to test, let’s say:

SELECT * FROM vw_test_me;

then we need to take a snapshot of it, to do so

Connection jdbcConnection = DriverManager.getConnection(<your db parameters>);
IDatabaseConnection connection = new DatabaseConnection(jdbcConnection);
// partial database export
QueryDataSet partialDataSet = new QueryDataSet(connection);
partialDataSet.addTable("test_snapshot", "SELECT * FROM vw_test_me");
FlatXmlDataSet.write(partialDataSet, new FileOutputStream(String.format("snapshot.xml","test_snapshot")));

Now we have an xml file called snapshot.xml which contains the resultset of the query above.

This is the starting point of our tests, everytime the query runs the result have to match with the resultset contained into snapshot.xml otherwise the test will fail.

dbUnit is a JUnit extension so the test should be written accordingly:

import junit.framework.TestCase;
import org.dbunit.IDatabaseTester;
import org.dbunit.JdbcDatabaseTester;
import org.junit.After;
import org.junit.Test;

public class TestQuery extends TestCase
{
    protected IDatabaseTester databaseTester;

    public TestQuery()
    {
        databaseTester = new JdbcDatabaseTester(<your db parameters>);
    }
    @Test
    public void testMyQuery()
    {
        ITable actualTable = databaseTester.getConnection().createQueryTable("test_query", "SELECT * FROM vw_test_me");
        IDataSet expectedDataSet = new FlatXmlDataSetBuilder().build(new File("snapshot.xml"));
        ITable expectedTable = expectedDataSet.getTable("test_snapshot");
        Assertion.assertEquals(expectedTable, actualTable);
    }
    @After
    @Override
    public void tearDown() throws Exception
    {
        databaseTester.getConnection().close();
    }
}

the testMyQuery method first runs the query we are testing and store its result into variable ‘actualTable’, then it retrieves the snapshoted resultset from snapshot.xml, eventually it checks if the two datasets matches.