Backing It Up

An important part of server administration (and computer use in general) is making regular backups. While setting up and testing a new server or web application, the easiest and often cheapest starting point is to back up the project to the development machine (in this case my laptop), and then regularly backup the data onto external hard drives, although this is not an appropriate backup strategy for production. This can easily be automated with shell scripting. There are several different shells in UNIX-like systems, most of which are (somewhat) compatible with each other. The default scripting language and interactive shell for many Linux distributions is bash, which is included with Windows Subsystem for Linux, and is also the default shell in macOS, accessible through Terminal.app.

I have created a local directory structure to store the backups for this server, located at /root/_vps on my laptop. This makes it easy to copy everything to external storage and keep things organized.

root@<laptop># ls -lF $HOME/_vps
total 12
drwxr-xr-x 3 root root 4096 Aug  8 17:21 backups/
-rw-r--r-- 1 root root  127 Aug  8 11:17 <server>-backup.exclude
-rwxr-xr-x 1 root root  323 Aug  8 11:38 <server>-backup.sh*

I keep a recent directory, which mirrors the contents of the server, as well as regularly making snapshots of its state. To automate this, I wrote the script that updates a local copy of the server’s root / and then creates an archive of this copy daily. Shell scripts are often used to glue together other programs under UNIX-like systems, so the backup is actually being done using a combination of rsync, which allows for “fast incremental file transfer;” tar, an archiving program introduced in January 1979 as part of Seventh Edition Unix; and the XZ Utils to compress the resulting file archive and save disk space. Using tar is important because it can preserve file permissions and ownership information, which are necessary for using the backup for an easier recovery. rsync also has this capability, but in both cases, this requires that we run the script as the superuser or root, since changing file ownership typically requires elevated privileges.

<server>-backup.sh

#!/bin/sh

source "$HOME/.keychain/$HOSTNAME-sh"
eval `keychain --noask --eval id_ed25519`

/usr/bin/rsync \
	-aAXvP \
	--delete \
	--delete-excluded \
	--exclude-from="$HOME/_vps/<server>-backup.exclude" \
	"root@<server>:/" "$HOME/_vps/backups/<server>/"

if [ $? -eq 0 ]; then
	/bin/tar -C "$HOME/_vps/backups/<server>" \
		-cvpf - . | xz -9 -T 0 -z - \
		> "$HOME/_vps/backups/<server>-$(date +%Y-%m-%d_%H-%M-%S).tar.xz"
fi

The backup script above has two distinct parts. The first makes a copy over everything on the remote server except for files and directories explicitly excluded by the <server>-backup.exclude file. The second makes an archive of the local directory $HOME/_vps/backups/<server>, which is the name of the directory with a timestamp and file extensions appended, such as <server>-2006-01-02_15-04-05.tar.xz, whenever the local copy of the server is successfully updated. This command uses both a unix pipeline | and redirection > of standard output to filter the tar command through the xz compression algorithm, and finally create the compressed archive of the backup on the local (laptop) filesystem.

<server>-backup.exclude

/dev/*
/home/*/.cache
/lost+found
/media/*
/mnt/*
/proc/*
/run/*
/sys/*
/tmp/*

It is important to exclude the contents of /dev, /proc, /run, and /sys (in this case using * as a wildcard character) because these are special directories populated at boot time, which are required for the system to function correctly. Excluding other directories on the list is less crucial, but /media and /mount are typically used for mounting removable media and external hard drives, which can possibly create an infinite loop if backing up locally to an external drive. The other choices are optional, but, if space is scarce locally, storing user .cache data and /tmp (temporary files often cleared on reboot or by a command run by cron) is probably not worthwhile.

Once the backup script and exclude file are in place, the backup script must be made executable chmod +x <server>-backup.sh so that it can run as a program. It can then be called from the crontab on the development (laptop) system, keeping in mind that the script should be run as root to preserve file permissions and ownership information. It is worth noting that rsync is relying on ssh to access the server, which requires that ssh-agent is configured. This is set up using keychain on my development laptop to allow cron to access the password-protected ssh key cached in memory without user input. The source and eval lines at the beginning of the script ensure that the proper environment is set when the script is run by cron, allowing access to the server for the pupose of the backup. This can be further secured by setting up a separate ssh key, which is allowed to run only certain commands on the remote; since I am the only administrator of the machine, I have avoided doing that for now. Once the script has run successfully, the $HOME/_vps/backups/<server> will be populated with the filesystem of the remote, and there will be a timestamped archive in the backups directory for each time that the script completed successfully:

root@<laptop># ls -ltF $HOME/_vps/backups/*
-rw-r--r--  1 root root 1637914316 Aug 21 12:32 /root/_vps/backups/<server>-2018-08-21_12-11-59.tar.xz
-rw-r--r--  1 root root 1636525760 Aug 20 21:26 /root/_vps/backups/<server>-2018-08-20_21-14-26.tar.xz
-rw-r--r--  1 root root  504860952 Aug 15 12:49 /root/_vps/backups/<server>-2018-08-15_12-35-54.tar.xz

/root/_vps/backups/<server>:
total 76
drwxrwxrwt  2 root root 4096 Aug 21 12:09 tmp/
drwxr-xr-x  2 root root 4096 Aug 21 09:06 run/
drwxr-xr-x 86 root root 4096 Aug 20 17:37 etc/
drwx------  8 root root 4096 Aug 20 17:37 root/
dr-xr-xr-x  2 root root 4096 Aug 18 18:19 sys/
drwxr-xr-x  2 root root 4096 Aug 17 15:00 dev/
dr-xr-xr-x  2 root root 4096 Aug 17 14:59 proc/
drwxr-xr-x  2 root root 4096 Aug 16 22:28 bin/
drwxr-xr-x  2 root root 4096 Aug 16 22:28 sbin/
drwxr-xr-x 12 root root 4096 Aug 16 20:55 var/
drwxr-xr-x  4 root root 4096 Aug 16 20:32 boot/
lrwxrwxrwx  1 root root   33 Aug 15 09:24 initrd.img -> boot/initrd.img-4.15.0-32-generic
lrwxrwxrwx  1 root root   30 Aug 15 09:24 vmlinuz -> boot/vmlinuz-4.15.0-32-generic
drwxr-xr-x 19 root root 4096 Aug  8 12:16 lib/
lrwxrwxrwx  1 root root   33 Aug  8 12:12 initrd.img.old -> boot/initrd.img-4.15.0-30-generic
lrwxrwxrwx  1 root root   30 Aug  8 12:12 vmlinuz.old -> boot/vmlinuz-4.15.0-30-generic
drwxr-xr-x  2 root root 4096 Aug  8 12:11 lib64/
drwxr-xr-x 10 root root 4096 Aug  8 12:11 usr/
drwxr-xr-x  2 root root 4096 Aug  8 12:11 opt/
drwxr-xr-x  2 root root 4096 Aug  8 12:11 srv/
drwxr-xr-x  2 root root 4096 Aug  8 12:11 mnt/
drwxr-xr-x  2 root root 4096 Aug  8 12:09 media/
drwxr-xr-x  2 root root 4096 Apr 24 04:34 home/

Restoring the backup can either be done by copying the desired archive to the remote server and extracting it there or by reversing the rsync portion of the script with some minor changes. Of course, this may be a bit overkill if only some of the remote files become corrupted or are missing, for whatever reason, in which case it make more sense to selectively restore the desired files and directories.