Super simple backups

Super simple backup setup for Mac

In this post I explain a little bit about how I’ve set things up so that I’m comfortable knowing that my files are backed up to an Amazon S3 bucket every night. Most of this post should work exactly the same on any Linux or BSD system, but I’ve personally only used it on Mac.

With this approach my backups are

And since you’re going to want to know. I spend well under $1 / month for the 20Gb of backup storage I’m currently using. However the first month the storage bill as about $10, partly for the initial transfer but mostly because of a good deal of experimentation. Still bonkers cheap.

Half the work is dotfiles

Before you go ahead and copy this solution you should be aware that this backup approach only deals with content such as documents, source code, images, temporary files on the Desktop etc. It does not deal with installed software or settings.

Besides the Amazon S3 backup described in this post I’m also using a combination of dotfiles and Bitwarden to backup software, settings and secrets. I’ll probably write something about that part of my setup later. All you need to know at this point is that the following solution solely focus on good old files.

Also worth knowing is that this backup script deals only with files and folders under your $HOME folder. I keep my computer pretty tidy and make sure everything worth backing up sits somewhere in my $HOME. If you run things differently and want to backup other folders as well, you’ll probably find it easy enough to modify the script below.

Requirements

Not a lot of requirements at all, here we go:

I highly recommend creating a new IAM user just for this project. Even though it’s not covered in this tutorial it makes it possible to add restrictive access controls later on. Nothing worse that discovering too late that a lot of applications are sharing the same credentials.

Creating the bucket

First we need a bucket. Go to the Amazon S3 console and click Create a bucket. Most of the settings are straight forward.

Select a suitable name, for instance the name of your laptop. Select a region, you probably want to select the closest region available to minimize network latency.

Then there are a few key settings that you want to get correct from the start:

Installing aws command line

Next we install the AWS cli tool using Brew:

$ brew install awscli

That’s easy. Next we need to set up the command line tool so it’s able to access our bucket. AWS Cli allows you to setup several profiles with different defaults. I highly recommend setting up a separate profile for the purpose of backups, I call mine ‘backups’:

$ aws configure --profile backups
AWS Access Key ID [None]: YOUR_AWS_ID
AWS Secret Access Key [None]: SECRET_KEY
Default region name [None]: us-west-2
Default output format [None]: json

If you’re already using Aws Cli for other things you can go a bit more advanced and edit the config files directly, Amazon has some great documentation about this here.

It’s a good idea to test the settings with a simple command. Using the ‘ls’ command you should see the bucket you created earlier and if you’ve been using AWS previously, probably a few more:

$ aws s3 ls --profile=backups
2021-03-31 09:24:57 erik-laptop
2021-02-15 00:16:17 some-other-bucket

The backup script

With a bucket and a configured command line tool in place it’s time to have a look at the actual script. I’ve created a folder named ~/src/backup and stored the below script as below is stored in ‘~/src/backup/backup-aws.sh’:

!/usr/bin/env bash
 BUCKET="erik-laptop"
 FOLDERS="src Desktop Downloads Documents"
 PROFILE="backup"

 for FOLDER in $FOLDERS; do
     aws s3 sync $HOME/$FOLDER s3://$BUCKET/$FOLDER \
     --profile=$PROFILE
     --no-follow-symlinks \
     --exclude="/node_modules" \
     --exclude="/.vagrant" \
     --exclude="/vendor" \
     --exclude=".vdi" \     
     --exclude=".DS_Store" \
     --exclude="/project1/www" \
     --exclude="/project2/www" \
     --include="/project3/src/plugin_name/vendor"
 done

Super simple right?

The first three lines are just setting some config values:

Further down the script You’ll notice that there are a lot of lines starting with ‘–exlude’. Each of these lines has a file pattern of files or entire folder to exclude from the backup. This list should be adapted to your own needs, here’s my reasoning for a few of these:

I’m also using an explicit ‘–include’ when needed. In one of my projects we have a folder named ‘vendor’ that actually can’t be recreated (as easily). So I’ve chosen to make sure ‘/project3/src/plugin_name/vendor‘ is included in backups as well.

Depending on the type of projects and software you are working on or with, the list of files and folder to include or exclude may differ a bit so you may need to adjust the list. Amazon has good documentation on how to write exclude and include patterns here.

Running the script

I suggest running this script manually at least once. Make sure you are in the folder where you saved the script and type:

# make it executable
$ chmod +x backup-aws.sh

#run it
./backup-aws.sh

The first time will take an awful lot of time. My initial backup was about 6Gb (before adding more folders) and took a good 45 minutes on a fast Internet connection. Your milage will vary.

When the initial backup is done you can try running it again to verify it’s working as intended. The second time around should be a lot faster, normally about 4-5 minutes on my laptop.

Adding a cron job

Once we’re satisfied that the backup script does what it should, it’s time to add a cron job for it. To edit your current users crontab using the Nano editor just type:

$ EDITOR=nano crontab -e

Add the following line to have this script run at 3:01AM every day

1 3 * * *       cd /Users/myusername/src/backup && ./backup-aws.sh > /dev/null 2>&1

How does this work?

The magic sauce in this setup is the ‘aws s3 sync’ command. It will recursively copy new and updated files from the source directory on your computer to the destination which is an S3 bucket. It will figure out which files that have been created, updated or deleted since the last time it ran and transfer all those changes to S3.

I think it’s fair to compare ‘aws s3 sync’ with rsync, but specifically designed to work with S3 buckets.

Since the bucket we configured for this is versioned, previous versions of an updated or deleted file will still remain on the bucket. So what ever ransomware attacks you are subjected to you will always be able to retrieve an earlier version of the file.

What am I missing?

The whole reason I wrote this post is for you to criticize it.

I’m the first to admit that I’m not terribly experienced when it comes to working with backups, but perhaps you are? I find this approach to backing up personal stuff so easy that there’s bound be some flaw somewhere in this setup that I didn’t understand or consider.

If you spot the problem, don’t hesitate to comment below.

Leave a comment

Your email address will not be published. Required fields are marked *