Friday, January 04, 2013

A Better S3 Backup Solution

A while ago I setup duplicity to back up to Amazon's S3 and all was good. Every night, I'd get an e-mail from cron saying the backup completed and I'd note that there were no errors.

Everything was going great until I actually had a file I needed to restore from backup.

First, I ran into this error:

OSError: [Errno 24] Too many open files

This turned out to be easy to fix.

Then, I kicked off a restore and waited. And waited. And waited. After 20+ hours of trying to restore a single file, I knew my backup strategy needed to change. I think there are few issues that were causing this massive slowdown: (a) I've always been making incremental backups. This makes for a "long backup chain", which apparently is slow to recover. (b) I'm using GPG encryption. Encryption is a nifty feature, but I'm storing files on Amazon's S3 and I'm wiling to trust that they are secure over there. And finally, (c) duplicity has been holding onto every backup since the beginning of time. That's cool, but a bit excessive. I should probably be deleting backups once they get nice and old, like say, 6 months.

With these lessons learned, here's my updated backup script. It's very much inspired from this example:



dirs="/home \
     /var  \

for d in $dirs; do
  prefix=$(basename $d)
  echo "Started: $d"
  duplicity --full-if-older-than 1M --no-encryption $d s3+http://backup.`hostname`/$prefix
  duplicity remove-older-than 6M --force s3+http://backup.`hostname`/$prefix
  echo "Ended: $d"
  echo ""

As for my 20+ hour plus restore time to get a single file. There's a few things I'm doing to tackle that: (1) I've kicked off the restore command with the --file-to-restore option set to the individual file I need. (2) I've kicked off the duplicity command with -v4 so I can see the progress that it's making. So far, it's telling me that it's processed volumen 89 of 15,961. Slow as molasses, but at least I know duplicity is working away.

No comments:

Post a Comment