In this post I explain a little bit about how I’ve set things up so that I’m comfortable knowing that my files are backed up to an Amazon S3 bucket every night. Most of this post should work exactly the same on any Linux or BSD system, but I’ve personally only used it on Mac.
With this approach my backups are
- Offsite – so that any amount of fire, earthquakes, floods or avalanches can’t take my data away.
- Safe – I trust that the Amazon engineers are magnitudes better att maintaining their data centerns than I could ever maintain some local hardware
- Secure – Again, I trust the Amazon engineers to have magnitudes better security practices than I ever will. Bad actors will find it more difficult to steal data from Amazon than from me.
- Versioned, so I can go back to any older version of a specific file at any time. This protects me from a lot of my own stupid mistakes as well as the odd case of a ransomware attack.
And since you’re going to want to know. I spend well under $1 / month for the 20Gb of backup storage I’m currently using. However the first month the storage bill as about $10, partly for the initial transfer but mostly because of a good deal of experimentation. Still bonkers cheap.
Half the work is dotfiles
Before you go ahead and copy this solution you should be aware that this backup approach only deals with content such as documents, source code, images, temporary files on the Desktop etc. It does not deal with installed software or settings.
Besides the Amazon S3 backup described in this post I’m also using a combination of dotfiles and Bitwarden to backup software, settings and secrets. I’ll probably write something about that part of my setup later. All you need to know at this point is that the following solution solely focus on good old files.
Also worth knowing is that this backup script deals only with files and folders under your $HOME folder. I keep my computer pretty tidy and make sure everything worth backing up sits somewhere in my $HOME. If you run things differently and want to backup other folders as well, you’ll probably find it easy enough to modify the script below.
Requirements
Not a lot of requirements at all, here we go:
- Brew – to install the aws command line utility
- An existing Amazon AWS account
- An IAM user set up with a valid access key and secret
I highly recommend creating a new IAM user just for this project. Even though it’s not covered in this tutorial it makes it possible to add restrictive access controls later on. Nothing worse that discovering too late that a lot of applications are sharing the same credentials.
Creating the bucket
First we need a bucket. Go to the Amazon S3 console and click Create a bucket. Most of the settings are straight forward.
Select a suitable name, for instance the name of your laptop. Select a region, you probably want to select the closest region available to minimize network latency.
Then there are a few key settings that you want to get correct from the start:
- Make sure ‘Block all public access’ is enabled so that your files aren’t available from the Internet.
- Enable Bucket versioning
Installing aws command line
Next we install the AWS cli tool using Brew:
$ brew install awscli
That’s easy. Next we need to set up the command line tool so it’s able to access our bucket. AWS Cli allows you to setup several profiles with different defaults. I highly recommend setting up a separate profile for the purpose of backups, I call mine ‘backups’:
$ aws configure --profile backups
AWS Access Key ID [None]: YOUR_AWS_ID
AWS Secret Access Key [None]: SECRET_KEY
Default region name [None]: us-west-2
Default output format [None]: json
If you’re already using Aws Cli for other things you can go a bit more advanced and edit the config files directly, Amazon has some great documentation about this here.
It’s a good idea to test the settings with a simple command. Using the ‘ls’ command you should see the bucket you created earlier and if you’ve been using AWS previously, probably a few more:
$ aws s3 ls --profile=backups
2021-03-31 09:24:57 erik-laptop
2021-02-15 00:16:17 some-other-bucket
The backup script
With a bucket and a configured command line tool in place it’s time to have a look at the actual script. I’ve created a folder named ~/src/backup and stored the below script as below is stored in ‘~/src/backup/backup-aws.sh’:
!/usr/bin/env bash
BUCKET="erik-laptop"
FOLDERS="src Desktop Downloads Documents"
PROFILE="backup"
for FOLDER in $FOLDERS; do
aws s3 sync $HOME/$FOLDER s3://$BUCKET/$FOLDER \
--profile=$PROFILE
--no-follow-symlinks \
--exclude="/node_modules" \
--exclude="/.vagrant" \
--exclude="/vendor" \
--exclude=".vdi" \
--exclude=".DS_Store" \
--exclude="/project1/www" \
--exclude="/project2/www" \
--include="/project3/src/plugin_name/vendor"
done
Super simple right?
The first three lines are just setting some config values:
- BUCKET – the target AWS S3 bucket
- PROFILE – the profile we defined when setting up the cli tool
- FOLDERS – Any folder name under your $HOME that you want to include in the backup
Further down the script You’ll notice that there are a lot of lines starting with ‘–exlude’. Each of these lines has a file pattern of files or entire folder to exclude from the backup. This list should be adapted to your own needs, here’s my reasoning for a few of these:
- node_modules – When I need to restore this folder will/should be recreated by npm, no need to keep them in my backup
- vendor – Same as above, but these folders will be recreated by Composer
- .vagrant – This is a temporary folder created when using Vagrant. All my vagrant machines can be created and provisioned from scratch, no need to keep this state
- .vdi – Disk images from VirtualBox. Same as with the .vagrant state folder, these are recreated on demand when I need them
I’m also using an explicit ‘–include’ when needed. In one of my projects we have a folder named ‘vendor’ that actually can’t be recreated (as easily). So I’ve chosen to make sure ‘/project3/src/plugin_name/vendor‘ is included in backups as well.
Depending on the type of projects and software you are working on or with, the list of files and folder to include or exclude may differ a bit so you may need to adjust the list. Amazon has good documentation on how to write exclude and include patterns here.
Running the script
I suggest running this script manually at least once. Make sure you are in the folder where you saved the script and type:
# make it executable
$ chmod +x backup-aws.sh
#run it
./backup-aws.sh
The first time will take an awful lot of time. My initial backup was about 6Gb (before adding more folders) and took a good 45 minutes on a fast Internet connection. Your milage will vary.
When the initial backup is done you can try running it again to verify it’s working as intended. The second time around should be a lot faster, normally about 4-5 minutes on my laptop.
Adding a cron job
Once we’re satisfied that the backup script does what it should, it’s time to add a cron job for it. To edit your current users crontab using the Nano editor just type:
$ EDITOR=nano crontab -e
Add the following line to have this script run at 3:01AM every day
1 3 * * * cd /Users/myusername/src/backup && ./backup-aws.sh > /dev/null 2>&1
How does this work?
The magic sauce in this setup is the ‘aws s3 sync’ command. It will recursively copy new and updated files from the source directory on your computer to the destination which is an S3 bucket. It will figure out which files that have been created, updated or deleted since the last time it ran and transfer all those changes to S3.
I think it’s fair to compare ‘aws s3 sync’ with rsync, but specifically designed to work with S3 buckets.
Since the bucket we configured for this is versioned, previous versions of an updated or deleted file will still remain on the bucket. So what ever ransomware attacks you are subjected to you will always be able to retrieve an earlier version of the file.
What am I missing?
The whole reason I wrote this post is for you to criticize it.
I’m the first to admit that I’m not terribly experienced when it comes to working with backups, but perhaps you are? I find this approach to backing up personal stuff so easy that there’s bound be some flaw somewhere in this setup that I didn’t understand or consider.
If you spot the problem, don’t hesitate to comment below.