Monday, January 03, 2011

Setting up an S3 backup solution on a CentOS VPS

I wanted to expand on the built in backup facilities Rimuhosting provides, and I figured S3 would be the way to go.

Attempt #1: s3fs

My first attempt was to use s3fs to make my S3 bucket available, and then use rsync to copy to it. I read about the idea here and was sold.

The setup instructions in the original post didn't quite apply to the CentOS server I was using. Via a combination of Google and trial and error, I came up with the follow yum install commands:

sudo yum install fuse fuse-devel python-devel
sudo yum install curl-devel
sudo yum install libxml-devel
sudo yum install libxml2-devel
cd ~/util/src
tar xzf s3fs-r191-source.tar.gz
cd s3fs
sudo make install

I entered the command:

sudo mkdir /mnt/s3
sudo /usr/bin/s3fs yourbucket -o accessKeyId=yourS3key -o secretAccessKey=yourS3secretkey /mnt/s3

And to my absolute amazement, it worked - I was able to trivially access my S3 bucket as though it were just another file system.

I kicked off an rsync command to test out the backup - and that's where I ran into the catch. It was so very s l o w. After a couple of minutes, only an itty bitty fraction of the 3 Gigs of data that needed to be backed up where in the S3 share.

Attempt #2: duplicity

Reading through the comments of the above approach, another user suggested duplicity as a work around for the sluggish S3FS behavior.

All it took was a simple:

 sudo yum install duplicity

and the duplicity was ready for use.

I found this HOWTO which gives specific tips about how to use duplicity over S3.

My final backup script wasn't far from what was shown there on that page. Here's what I arrived at:



dirs="/var/svn \
     /var/www \
     /home \

for d in $dirs; do
  prefix=$(basename $d)
  echo duplicity $d s3+$prefix
  duplicity $d s3+$prefix
  echo ""

The script above is a little chatty for testing purposes.

I was able to backup about 4Gigs of data in about 45 minutes (or, what felt like 30 minutes - I lost track of time).

I like the duplicity approach for a variety of reasons:

  • The standard file formats (tar+GPG) make sense from a backup perspective
  • The incremental backup functionality means that backups should be a relatively quick affair going forward
  • Restoring files is quite easy. I did a quick test and had no problems with it
  • The app has a unix'ish feel to it - it does one thing and does it well

I've added duplicity to cron and we'll see tomorrow how my 3am backup goes.

All in all, this seems like a winning solution. I'll also keep S3FS around too, as it's a wonderfully clever way to access S3 buckets.


  1. I've been considering this myself, thanks for the write up for CentOS. Has this been stable, do you still advocate this solution?

  2. Hey Paul - it's been working great for me. Highly recommended.

  3. Nice article on backing up CentOS ben! Thanks!