12 June 2014

We use S3 to backup various kind of files on MB. We use the very convenient backup gem for that (we still use 3.9.0).

But at some point it appeared that backing up our audio recording was hammering disk IO on our server, because the syncer is calculating md5 footprint for each file each time a backup happens. When you get thousands of big files that is pretty expensive process (in our case 20k files and 50G total).

So I added a small trick there:

in Backup/models/backup_audio.rb

module Backup::Syncer::Cloud
  class Base < Syncer::Base
    def process_orphans
      if @orphans.is_a?(Queue)
        @orphans = @orphans.size.times.map { @orphans.shift }
      end
      "Older Files: #{ @orphans.count }"
    end
  end
end

Backup::Model.new(:backup_audio, 'Audio files Backup to S3') do

  before do
    system("/Backup/latest_audio.sh")
  end

  after do
    FileUtils.rm_rf("/tmp/streams")
  end

  ##
  # Amazon Simple Storage Service [Syncer]
  #
  sync_with Cloud::S3 do |s3|
    s3.access_key_id     = "xxx"
    s3.secret_access_key = "xxx"
    s3.bucket            = "bucket_backup"
    s3.region            = "us-east-1"
    s3.path              = "/mb_audio_backup"
    s3.mirror            = false
    s3.thread_count      = 50

    s3.directories do |directory|
      directory.add "/tmp/streams"
    end
  end
end

and in Backup/latest_audio.sh

#!/bin/sh
# isolate files changed in the last 3 days

TMPDIR=/tmp/streams

mkdir $TMPDIR
for i in `find /storage/audio/ -type f -cmin -4320`; do
  ln -s $i $TMPDIR
done

It creates a fake backup dir with links to the files that actually changed in the last 3 days and patches the syncer to avoid flooding the logs with orphan files. When sometimes S3 upload fails on one file (and it happens from time to time for ‘amazonian’ reason) it will be caught on the next daily backup.

The result was pretty obvious on our disk usage with our daily backups:

disk usage

in the logs:

[2014/06/10 07:00:25][info] Summary:
[2014/06/10 07:00:25][info]   Transferred Files: 5
[2014/06/10 07:00:25][info]   Older Files: 22371
[2014/06/10 07:00:25][info]   Unchanged Files: 16
[2014/06/10 07:00:25][info] Syncer::Cloud::S3 Finished!
[2014/06/10 07:00:25][info] Backup for 'Audio files Backup to S3 (backup_audio)' Completed Successfully in 00:00:22
[2014/06/10 07:00:25][info] After Hook Starting...
[2014/06/10 07:00:25][info] After Hook Finished.