Backing up with Duplicity, Effortlessly

Duplicity has a wonderful design feature: it’s really really simple.

Duplicity has an annoying design flaw: it’s really really simple.

In case you didn’t notice, Duplicity‘s simplicity is both helpful and inefficient. It’s extremely easy to start using Duplicity because its usage is so very simple. Want to back something up? Just write “duplicity /some/directory ftp://user@host.com/some/other/directory” for FTP backup, or “duplicity /some/directory file://some/other/directory” for local backup, or any other protocol out of the 11+ protocols it supports. All it does is write files to the location of your choosing, and then it can recover and list files (either latest or at a date of your choosing) from the backup directory you give it.

This simplicity is really great for the simple use-cases, like backing up a home directory. It’s when you get into databases, exclusion and inclusion rules, and other such fine print that you have to plan a little.

I’ve got three different “things” I have to back up: my web development stage, my remote shell (mainly for irssi), and my home directory. Each one presents its own challenges, which I break down below.

Backing up a Web Development Stage

There are two parts to my stage: a database, and a public_html directory. (The configuration of the server is handled by OSX. Otherwise, I’d be backing up configuration files as well.) Backing up directories is what Duplicity is designed to do, so that’s no issue. Backing up raw /var/mysql/ files is generally considered to be a bad idea, so what I do instead is generate a mysql dump file (“mysqldump -uroot -p”) and stash that into a temporary directory.

Note that the temporary directory is an entirely different place than the public_html directory, so I need to somehow tell Duplicity to back up two different locations at the same time. It’s actually not too difficult to figure out:

duplicity / --include /path/to/public_html --include /path/to/temp_directory --exclude '**' file://some/other/directory

Duplicity decides whether or not to include or exclude files by simply finding the first matching rule on the command-line. In this example, /path/to/temp_directory/mysql.dump matches, because everything in its directory (“temp_directory”) is getting backed up. Perfect. Also note that my public_html directory gets matched the same way. Any others that don’t match the first two inclusion rules will always match that ‘**’ exclusion rule, so that the only two things backed up are public_html and temp_directory.

The only bummer here is I have to re-dump the mysql.dump file to a temporary directory. For a long time I did this by hand, but it’s an extra step I’d rather be done automatically. Keep that in mind, because I’ll be coming back to this in a few sections.

Backing up a Remote Shell

Problem: Duplicity must back up from a local path. Solution: download the remote location to a local directory, and back that up. So, now backing up becomes a two step process, as the web development stage was:

rsync -zav user@host:/some/directory /path/to/temp_directory
duplicity /path/to/temp_directory file://some/other/directory

Again, I’d much rather send off a simple one-liner, and not have to worry about rsyncing and whatever else.

Backing up a Home Folder

Couldn’t get any simpler, so I’ll spend just a few lines on this topic:

duplicity /home/directory file://some/other/directory

And we’re done. This is the easiest yet. I personally also use a bunch of –exclude arguments to keep things like Trash, caches, and large ISO’s out.

Doing all the Above, Three Times a Week

Ugh. What a pain. I count at least five commands, if not more. How can I make this easier?

Easy! Write a bash script! That was the first thing I did. I wrote a script called “duplicity-home-folder.sh” and one called “duplicity-var-www.sh” and so on. Each one had a few simple copied-and-pasted checks, along with a usually-monstrous duplicity command, complete with verbosity and inclusion/exclusion arguments.

This worked on and off for a while. I was keeping up a reasonable habit of backing up, and it paid off quite a few times. When I moved to a new operating system, I simply backed up, installed, and “recovered”. The same applied to my move to OSX, though it was complicated by the fact that OSX is not Linux. The same will apply to my upgrade to Snow Leopard.

Now that I’m in OSX, though, I have a completely different set of data to back up. I began tweaking the simple bash scripts, but soon found I was actually rewriting them instead. Having to copy-and-paste code between them was a pain, so I tried to merge them. Then I realized I really ought to store the backup settings separate from this script so I could adjust and add settings without touching the script. That’s when I decided to use Python instead.

Here’s how my current configs/home-jacob directory looks, where I stash all my backup configuration: (generated by “find configs/home-jacob”)

configs/home-jacob
configs/home-jacob/args
configs/home-jacob/binary
configs/home-jacob/inclusion-rules.d
configs/home-jacob/inclusion-rules.d/00-always
configs/home-jacob/inclusion-rules.d/00-always/exclude
configs/home-jacob/inclusion-rules.d/05-duplicity
configs/home-jacob/inclusion-rules.d/05-duplicity/exclude
configs/home-jacob/inclusion-rules.d/06-gentoo-embedded-handbook
configs/home-jacob/inclusion-rules.d/06-gentoo-embedded-handbook/exclude
configs/home-jacob/inclusion-rules.d/06-gentoo-embedded-handbook/include
configs/home-jacob/inclusion-rules.d/07-gentoo-docs
configs/home-jacob/inclusion-rules.d/07-gentoo-docs/exclude
configs/home-jacob/inclusion-rules.d/10-documents
configs/home-jacob/inclusion-rules.d/10-documents/exclude
configs/home-jacob/inclusion-rules.d/10-documents/include
configs/home-jacob/inclusion-rules.d/15-chat
configs/home-jacob/inclusion-rules.d/15-chat/include
configs/home-jacob/inclusion-rules.d/40-odds-ends
configs/home-jacob/inclusion-rules.d/40-odds-ends/include
configs/home-jacob/inclusion-rules.d/50-gentoo
configs/home-jacob/inclusion-rules.d/50-gentoo/exclude
configs/home-jacob/inclusion-rules.d/50-gentoo/include
configs/home-jacob/inclusion-rules.d/60-security
configs/home-jacob/inclusion-rules.d/60-security/include
configs/home-jacob/inclusion-rules.d/70-settings
configs/home-jacob/inclusion-rules.d/70-settings/include
configs/home-jacob/inclusion-rules.d/75-gaming
configs/home-jacob/inclusion-rules.d/75-gaming/include
configs/home-jacob/inclusion-rules.d/99-everything
configs/home-jacob/inclusion-rules.d/99-everything/exclude
configs/home-jacob/location
configs/home-jacob/location/local
configs/home-jacob/location/remote
configs/home-jacob/post-run.d
configs/home-jacob/post-run.d/90-generate-checksums
configs/home-jacob/pre-run.d

As you can see, there’s quite a bit of flexibility now. All the include/exclude rules are broken out into separate, organized files. I can have my script run scripts before and after the backup occurs.

I’ll leave it to your imagination how I made the above three backup scenarios cut down into three simple one-liners.

Actually, I’ll help your imagination out a little bit by posting my “dirconfig.py” and “duplicity-quickstart.py” scripts. The first handles loading the above configuration, and the second handles executing duplicity and other scripts based on that configuration. These are my first attempts at Python, so be nice. šŸ˜‰

dirconfig.py

import os

class DirConfiguration:

	def __init__ (self, path, recursive=True, debug=False):
		if not os.path.exists(path):
			raise ValueError ('Path does not exist')
		elif not os.path.isdir(path):
			raise ValueError ('Path is not a directory')

		self.path = path
		self.recursive = recursive
		self.debug = debug
		self.config = self.__read_path(path, recursive=recursive)

	def __read_path (self, rootPath, config={}, recursive=True):
		paths = os.listdir(rootPath)
		for currPath in paths:
			currPath = rootPath + "/" + currPath
			key = os.path.basename(currPath)
			if self.debug: print 'this is for path', currPath

			if os.path.isdir(currPath) and recursive:
				value = self.__read_path(currPath, {}, recursive)
				if self.debug: print 'for key ' + key + ' storing child value:', value
				config[key] = value

			elif os.path.isfile(currPath):
				currFile = open(currPath, 'r')
				key = os.path.basename(currPath)

				if key == 'children':
					if self.debug: print 'Warning: illegal file name in "'+currPath+'":', key
				else:
					for currLine in currFile:
						if key not in config:
							config[key] = []

						if self.debug: print 'for key ' + key + ' storing:', currLine.strip()
						config[key].append(currLine.strip())

			else:
				if self.debug: print 'Notice: odd path exists in configuration directory:', currPath

		if self.debug: print 'returning this for path "' + rootPath + '":', config
		return config

duplicity-quickstart.py

#!/usr/bin/python

import sys
import subprocess
from optparse import OptionParser
from dirconfig import DirConfiguration

parser = OptionParser(usage='usage: %prog [options] config-dir', version='1.0')
parser.add_option('-v', '--verbose', dest='verbosity', help='verbosity level of duplicity (overrides args file)', type='int')
parser.add_option('-d', '--debug', dest='debug', help='turn on the debugger for the configuration parser', action='store_true', default=False)
parser.add_option('', '--force', dest='force', help='force duplicity to complete an action (most likely cleanup)', action='store_true', default=False)
parser.add_option('-a', '--action', dest='action', help='action duplicity ought to take', default='incremental', type='choice', choices=['full', 'incremental', 'restore', 'verify', 'collection-status', 'list-current-files', 'cleanup', 'remove-older-than', 'remove-all-but-n-full'])
parser.add_option('-f', '--file-to-restore', dest='restore_path', help='if restore is the action, this can determine which file specifically is restored', type='string')
parser.add_option('-r', '--restore-to', dest='restore_to', help='if restore is the action, this can determine where the restored files are stored', type='string')
parser.add_option('', '--allow-source-mismatch', dest='allow_mismatch', help='if the backup source changed, but you still want to use the same backup destination, and duplicity is complaining, use this', action='store_true', default=False)

(options, args) = parser.parse_args()

if len(args) < 1:
	raise ValueError('Error: only one extra argument required, to indicate the path for the backup configuration directory.')

configuration = DirConfiguration(args[0], True, options.debug).config

up_actions = ['full', 'incremental']
down_actions = ['restore', 'verify']
remote_actions = ['collection-status', 'list-current-files', 'cleanup', 'remove-older-than', 'remove-all-but-n-full']

duplicity_opts = []

if not configuration.has_key('binary'):
	raise ValueError('Error: backup directory has no binary file.')

duplicity_opts.extend(configuration.get('binary'))
duplicity_opts.append(options.action)

if configuration.has_key('args'):
	duplicity_opts.extend(configuration.get('args'))

if options.verbosity:
	duplicity_opts.extend(['-v', str(options.verbosity)])

if options.force:
	duplicity_opts.append('--force')

if options.allow_mismatch:
	duplicity_opts.append('--allow-source-mismatch')

if not configuration.has_key('location'):
	raise ValueError('Error: configuration directory has no location directory.')

elif not configuration.get('location').has_key('local'):
	raise ValueError("Error: configuration's location directory has no local file.")

elif not configuration.get('location').has_key('remote'):
	raise ValueError("Error: configuration's location directory has no remote file.")

local_path = configuration.get('location').get('local')[0]
remote_path = configuration.get('location').get('remote')[0]

if options.action == 'restore':
	if options.restore_to:
		local_path = options.restore_to
	if options.restore_path:
		duplicity_opts.extend(['--file-to-restore', options.restore_path])

if configuration.has_key('inclusion-rules.d') and options.action in up_actions or options.action == 'verify':
	for rule_cat, rules in sorted(configuration['inclusion-rules.d'].items()):
		for rule_type, values in sorted(rules.items()): # excludes always comes before includes
			if rule_type != 'include' and rule_type != 'exclude':
				sys.stderr.write('Warning: in category ' + str(rule_cat) + ', unkown rule type: ' + str(rule_type) + '\n')
				continue

			for value in values:
				if not value.startswith('#') and value.strip() != '':
					duplicity_opts.extend(['--'+rule_type, value.replace('%%local%%', local_path)])

if options.action in up_actions:
	duplicity_opts.append(local_path)
	duplicity_opts.append(remote_path)
elif options.action in down_actions:
	duplicity_opts.append(remote_path)
	duplicity_opts.append(local_path)
elif options.action in remote_actions:
	duplicity_opts.append(remote_path)
else:
	raise ValueError('Error: incorrect action. This is a bug: it should have been caught before now!')

if configuration.has_key('pre-run.d'):
	for script, script_contents in sorted(configuration['pre-run.d'].items()):
		print 'Calling pre-run script: ' + args[0]+'/pre-run.d/' + script
		return_code = subprocess.call(['sh', args[0]+'/pre-run.d/' + script, local_path, remote_path, options.action])
		if return_code != 0:
			raise Exception('subprocess returned non-zero code ' + str(return_code))

print 'Calling: ' + ' '.join(map(str, duplicity_opts))
return_code = subprocess.call(duplicity_opts)
if (return_code != 0):
	raise Exception('duplicity exited with a non-zero code: ' + str(return_code))

if configuration.has_key('post-run.d'):
	for script, script_contents in sorted(configuration['post-run.d'].items()):
		print 'Calling post-run script: ' + args[0]+'/post-run.d/' + script
		return_code = subprocess.call(['sh', args[0]+'/post-run.d/' + script, local_path, remote_path, options.action])
		if return_code != 0:
			raise Exception('subprocess returned non-zero code ' + str(return_code))
Advertisements
Post a comment or leave a trackback: Trackback URL.

Comments

  • Dion Moult  On October 23, 2009 at 8:06 am

    Just out of curiosity (no, actually I want to dominate the world with this information) are you backing up to a remote location – and if so which provider are you using?

    Yep – you guessed right, I’m looking for a good provider.

    • javaJake  On November 10, 2009 at 1:17 am

      Whoa, how did I miss this? Sorry for taking so long to reply!

      I don’t use any provider anymore. My web-host said I could not use their space for backups. (Their ToS conveniently changed on me.) I now use a system of disk burning, which I’ll likely blog about soon.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: