Automatically Backup MySQL databases to Amazon S3

If you happen to run your own server you need to think about backups.

mysqlFor MySQL to backup a database is as simple as:

mysqldump -u user -ppassword my_database > /var/local/backups/mysql/my_database_dump.sql

Now you have a .sql file, which contains your database, stored into the /var/local/backups/mysql/ folder.

Is your database safe now? Well, no! What if your server dies? What if you loose all the data on it?

Better move the database’s dump somewhere else!

Amazon_S3Amazon S3 is a cloud storage service which ships with a very practical CLI tool.

In order to use it you need an amazon’s account and you need to install and configure the CLI tool.

Now that you have all the tools in place let’s see how we are going to automatically backup mysql databases to amazon s3:

  1. create a bucket on Amazon s3 (say my-backups)
  2. create a new folder inside that bucket (say my-backups/mysql)
  3. back up a database via mysqldump and store the dump into a folder in your server (say /var/local/backups/mysql/)
  4. synchronize that folder to Amazon s3’s my-backups/mysql

Steps 1 and 2 are done just the first time while 3 and 4 should be performed repeatedly (i.e. schedule them).

Here is a small bash script which does the job:

You can save this script into a .sh file, for example, and then schedule every day in order to get an automatic backup. To do so, you need to add an entry to the crontab:

$ crontab -e

then add a line like the following at the bottom

00 00 * * * /bin/sh /path/to/

this will run the script every day at midnight.

If you want to backup more than one database and keep backups separated you could add a parameter to the script:

(pay attention ton the folders structure, it has to be consistent)

now you can pass the database name as a paramenter:

$ sh /path/to/ my_database

Testing query results with dbUnit

dbunit-logoWhen developing a data-warehouse you need to run tests on int to ensure that the datasets are consistent.

Call it data quality tests or just tests, they are needed if you want to trust your data.

Here is a simple tutorial on how to run test agains your dataset with dbUnit, by testing query results.

The idea is that you take a snapshot of a well tested resultset and the you test that the query which provides such dataset always return the expected result.

First of all you need a query to test, let’s say:

SELECT * FROM vw_test_me;

then we need to take a snapshot of it, to do so

Connection jdbcConnection = DriverManager.getConnection(<your db parameters>);
IDatabaseConnection connection = new DatabaseConnection(jdbcConnection);
// partial database export
QueryDataSet partialDataSet = new QueryDataSet(connection);
partialDataSet.addTable("test_snapshot", "SELECT * FROM vw_test_me");
FlatXmlDataSet.write(partialDataSet, new FileOutputStream(String.format("snapshot.xml","test_snapshot")));

Now we have an xml file called snapshot.xml which contains the resultset of the query above.

This is the starting point of our tests, everytime the query runs the result have to match with the resultset contained into snapshot.xml otherwise the test will fail.

dbUnit is a JUnit extension so the test should be written accordingly:

import junit.framework.TestCase;
import org.dbunit.IDatabaseTester;
import org.dbunit.JdbcDatabaseTester;
import org.junit.After;
import org.junit.Test;

public class TestQuery extends TestCase
    protected IDatabaseTester databaseTester;

    public TestQuery()
        databaseTester = new JdbcDatabaseTester(<your db parameters>);
    public void testMyQuery()
        ITable actualTable = databaseTester.getConnection().createQueryTable("test_query", "SELECT * FROM vw_test_me");
        IDataSet expectedDataSet = new FlatXmlDataSetBuilder().build(new File("snapshot.xml"));
        ITable expectedTable = expectedDataSet.getTable("test_snapshot");
        Assertion.assertEquals(expectedTable, actualTable);
    public void tearDown() throws Exception

the testMyQuery method first runs the query we are testing and store its result into variable ‘actualTable’, then it retrieves the snapshoted resultset from snapshot.xml, eventually it checks if the two datasets matches.