Blogs | Srijan

Improve your database backup policy by using automated scripts

Written by Team Srijan | May 17, 2011 7:00:00 AM

If you manage hosting infrastructure for Drupal websites as we do, you would already be having a process for taking back-ups of the databases of various websites at regular intervals - daily, weekly or monthly. The size of databases grows over a period of time, thus making it neccessary to know which databases are active and need to be backed up. Not caring for these may encourage backing up of un-wanted data leading to tremendous wastage of expensive bandwidth and storage, which does delay in development life-cycles.

 

As an exmaple in our case at Srijan - all backups were being pushed to late-at-night timings to ensure the performance of our hosted websites are not affected - thus leading to delays in Staging/QA server deployments by the development team. The key issues were:

  • dead databases were getting backed-up
  • log related tables were getting backed-up (which in some cases formed 80%-90% of database size)

This is where my work came in to improve the backup process at Srijan; the scripts and approach defined here would be useful for you if you face similar challenges at your organization.

Which Drupal tables to exclude?

The list of tables to be avoided in any backup includes: watchdog, accesslog, search_index, search_total, search_dataset, search_node_links, sessions and all cache tables. These tables SHOULD NOT form part of these database backups (even as the structure of these tables SHOULD be backed up).

The script given below specifies such tables list for which data is not dumped.

STRUCTURE_ONLY="/^(prefix1_|prefix2_)?(watchdog|search_index|search_total|search_dataset|
search_node_links|sessions|cache(_.+)?)$/"

For sites running Apache SOLR, you DO NOT need to take back up of the search_index and its secondary tableseither.

Avoid taking a dump of "information_schema" database. This database does not need a backup. It is not dump using MySqldump by default, but if mentioned, it does take a back up. So avoid this.

This takes a dump of the complete database, with only the structure the specified tables. It does takes a backup of data for all tables except the ones specified here.

Depending upon the modules installed in your Drupal installation, such a to-be-avoided list of tables may grow. These can then be added to the above script separated by a single pipe operator (|).

Skipping entire databases

Similarly, databases can be skipped too. If you want to skip certain databases during this process mention their name too. In the script below, the database "db_1" would be skipped from the backup.

DBS_SKIP="/^(prefix1_|prefix2_)?(information_schema|db_1)$/"

In case, if the list of skipped databases is too many, add those databases name which needs to be backed up, in the skip list and change the following line of code:

DBS_SKIP="/^(prefix1_|prefix2_)?(db_1|db_2|db_3)$/"
SKIP_DB=`echo "$DB" | gawk "$DBS_SKIP"`
# original line tells to skip, but the current change says only these to consider
# if [ $SKIP_DB ]
if [ ! $SKIP_DB ]
  then continue
fi

Success story

You may like to us this enhanced automated script to take the database backup, just as we do at Srijan. The script takes dump of each database, makes bz2 file and removes the SQL file, thus also reducing the backup size on the server.

The use of these scripts has helped us at Srijan reduce our 10GB daily backup data to only 180MB (bz2) - a reduction of over 55 times.