Backing up a drupal site.

I host a number of Drupal sites as well as wordpress and custom made ones as wel.

When you host a site, one of the first questions your asked is do you have the ability to back up and restore my site if something breaks?

For obvious reasons, that’s an important question. But, it’s a balancing act. It’s important to make sure you back up regularly but you don’t want to over do it and use up all your bandwidth on copying said backups off the server.

So, for backups you need to separate them in to four parts.

  • Nightly Full server backups.
    If the server goes down, I want to be able to bring it back within 5 minutes.
  • Monthly Full site backups.
    These will be compressed archives that contain everything from the site including content and databases.
  • Weekly differential site backups
    These are stored on a server that mirrors the configuration of the primary. It is used for testing new server configs before they go live on the production server.
  • Daily site backups
    This is a backup of important site files that can become dammaged as a result of errors during an upgrade or configuration change. This does not contain a database backup but is very useful for very quick restores.

With that in mind, I have created the final part of this puzzle. The following daily backup script archives the important directories in a drupal installation so their ready to be coppied by the remote server. I have these scripts saved to a location in the home folder of a very restricted account that is used simply for this task. A simbolic link in /etc/cron.daily points back to each of these scripts.

#!/bin/bash
thisdate=$(date +%Y%m%d)
backupstatus=false
tar -zcvf /home/UserName/backups/UserName.tar.gz /home/UserName/public_html/sites/all /home/UserName/public_html/sites/default/settings.php /home/UserName/public_html/sites/default/files/playlists /home/UserName/public_html/sites/default/files/js /home/UserName/public_html/sites/default/files/css /home/UserName/public_html/cron.php /home/UserName/public_html/includes /home/UserName/public_html/index.php /home/UserName/public_html/install.php /home/UserName/public_html/misc /home/UserName/public_html/modules /home/UserName/public_html/profiles /home/UserName/public_html/scripts /home/UserName/public_html/themes /home/UserName/public_html/update.php /home/UserName/public_html/xmlrpc.php && backupstatus=true
if [ $backupstatus = false ]; then
echo Error $thisdate Backup failed. >> /home/UserName/backups/UserName.log
else
echo $thisdate Backup completed without errors. >> /home/UserName/backups/UserName.log
fi
backupstatus=
thisdate=
chown RestrictedAccount UserName

So, what am I doing there?

  • First, I declare a variable to hold the date.
  • Second, I declare a variable that holds the value false. If the archive command doesn’t work, this will never be set to true.
  • Next, I archive very specific folders. Notice, I’m not archiving /home/UserName/public_html/sites/default/files because that contains audio, pictures and videos and I really don’t want or need to include them in every days backup file because it would be far too large.
  • Notice that there’s a change to the BackupStatus variable at the end of the archive command. Because this starts with an &&, it will not be run unless the archive command is successfull.
  • Next, I use an if statement. If the backup status is false, I write to the error file. Notice that I put error at the start of the line. This just makes things a bit easier because I can look through the start of the log for a line that doesn’t start with a date.
  • Of course, if the variable comes back true, then the log file is updated to reflect that the archival job was successfull.
  • Finally, I do some clean up. I set both variables to blank values and make sure that the user who has only very few access privlidges can get the file.
  • I don’t doubt that there may be a better way of doing that, but this way works very well.

    On the other machine, a cron job is set to run very early in the morning to copy down these archives. With every archive it copies, it logs it on the remote server. That way, if what I call the copy job fails, I can see it and take any required action.

    I may be doing too many backups at the moment. With any process like this, it will take some analysis for a few weeks to determine if I can reduce the frequency of backups depending on the number of updates made to each site. Because I don’t host a huge amount, I can even tailor the back up schedule per site so as sites that are updated frequently are backed up more often.

    Other topics relating to this are linked below: