22.00.0101: My data storage & backup strategy

https://jdcm.al/22.00.0101

By email, @dylanjr asked about my data storage and backup strategy. In answering, I thought it’d be useful enough to formalise here.

The ‘problem’

Here’s what I’m working with. The scope of this is mine & Lucy’s personal stuff, and the Johnny.Decimal business.

We each have a laptop. Mine only has 500GB of storage so I’m severely limited. Lucy’s has 1TB but this is still smaller than the … checks … ah not quite, 787GB that is the entire D85 Johnny.Decimal business folder.

That folder contains all of the raw footage from the workshop. That’s 641GB just there. I probably don’t need to keep that now, but whatever. I certainly don’t need to keep it all on a laptop.

So we also have a Mac mini. The ‘server’, even though it just runs plain macOS. The point is that it’s always powered on.

Mac minis are great for this. I ran a 2010 version until 2023. It was sold as a server, back when Apple did that. 13 years isn’t bad from a ~$1,000 computer.

I replaced it with a refurb M1 mini, lowest spec. Also ~$1,000. Should also last well over a decade.

These machines don’t need to do much. No data is stored on them – we’ll get to that. Processor wise, they do next to nothing. They serve files. So don’t spend money on anything fancy.

NAS vs DAS

Okay, so where’s all the data? The difference between NAS & DAS is both subtle and vitally important. I’ll assume you’re not a storage expert, so I’ll spell it out.

‘NAS’ stands for ‘network-attached storage’. This means that the device itself attaches to your network. It is a tiny server: it has its own smarts. Synology and QNAP are the ones you’ve heard of.

The advantage of these devices is that you don’t need another server. Like a Mac mini. They do it all themselves: you configure them on your network, then you can connect to their storage from any computer. That can be really convenient, and I still have an old Synology in my setup.

The downside is that if you want to configure them to do anything special, you’re dealing with a device that is usually underpowered, whose software is not macOS or Windows. It’s some specialised Synology or QNAP thing.

So if you want to run, say, Syncthing – we’ll get to all of this later – then you’re depending on that software having a Synology version. Not everything does, and when it does, it’s often a cut-down variant. This can be limiting.

For these reasons, I moved away from a NAS when I bought the new mini.

DAS

DAS stands for ‘direct-attached storage’. It’s storage that is directly attached to a computer. It’s basically an external hard drive. And you can just use an external hard drive from your office supplies store.

But if this is your central data store, you probably want something a bit more advanced. Hard drives get hot, so something with a fan is nice. And you can get units that take multiple disks, which can provide redundancy in case one of the drives fails.

(It depends how you set these disks up, and this is as technical as I’ll get here. Look up RAID levels if you want to know more.)

So I have a LaCie 2big 16TB DAS that is plugged directly in to the Mac mini via USB-C.

The LaCie doesn’t do anything by itself. It requires a server. But now that server is a fully-featured computer, and I can install whatever I want on it.

So where’s all the data?

Let’s recap. Johnny has a 500GB MacBook Air. Lucy has a 1TB MacBook Pro. And there’s an always-on Mac mini whose internal storage is insignificant because it has 16TB of HDD plugged in the back. I’ll call the mini ‘the server’ from now on.

Technically, we could store all of our files on the server and access them over the network. But this would be slow, especially over wifi. Ideally, you want the things you’re using all day to be on the machine you’re using.

This is where synchronisation software comes in. Dropbox is the one you know: you install it, point it to a folder, and it synchronises all of those files. If you want them on another computer, you just install it there and wait for them to copy over. As a bonus, now they’re also in the cloud, and you can log in to a website and access them from anywhere.

This is a great technology but it comes with limits. What if me and Lucy both edit the same file at the same time? This causes a ‘conflict’, and there’s not much you can do about that. Dropbox can’t merge our Excel sheets, it’s just too hard. So that’s just something to be aware of.[1]

Syncthing

The secret sauce is an amazing piece of free software: Syncthing.[2]

It’s like Dropbox, but completely configurable. You get to say what synchronises from which computer to which other computers. And you get granular control down to the folder or file level.

You also get to control what happens to the file on this computer after a new version is received from that computer. This is really handy. On the server, I’ve got it configured to keep versions of each file, which it deletes as they get old. It’ll keep … well, let’s just quote Syncthing:

The following intervals are used: for the first hour a version is kept every 30 seconds, for the first day a version is kept every hour, for the first 30 days a version is kept every day, until the maximum age a version is kept every week.

So as Lucy is working on the small business system, every time she saves the document, it’s synchronised to both my laptop, and the server. The server is then applying the ‘retention policy’ as described above. This is one form of backup: if Lucy accidentally deletes all the text in the document, we can just grab a previous version from Syncthing. Super handy.

Our Syncthing configuration

In a nutshell: the server has everything, and we each have most stuff, minus the massive folder of workshop video files. There’s a bit more to it, but that’s all you really need to know.

Start picturing ‘blobs’ of data

What’s important is what this means for our data. When you’re planning something like this, you need to have this picture in your mind of:

  1. What your data is, and
  2. Where your data is.

Johnny.Decimal makes this easy for me to think about. Each blob of data – the minimum unit of ‘my data’ that I think about – is a Johnny.Decimal system. I have:

  1. D85 Johnny.Decimal (the business)
  2. D01 johnnydecimal.com (the website)
  3. P76 Johnny's personal life
  4. L77 Learn with Lucy (the Excel course)
  5. Z99 Archive some old long-term archives, including some data that isn’t mine

I know that all of this data is on the server. That’s really important when it comes to backups, later. It’s so important, it’s a non-negotiable: all data must always be on the server. Then I know that if both laptops fall in the ocean, nothing is actually lost.

I also know that Syncthing is synchronising the important stuff that we use every day to both laptops. And those laptops synchronise to each other.

This is important because if one of the laptops falls in the ocean, it’d be nice to be able to access our important daily stuff quickly. We can do that from the other laptop. And when we get a replacement machine, the two laptops can talk directly to each other. The server, physically far away, is a last resort.

So that’s the day-to-day synchronisation of data. Syncthing is indispensable. It’s complex, but worth getting to know. If you need any help, ask.

Backups

Backups? Didn’t we just talk about backups? All these copies of your data all over the place on three machines?

We did not. Synchronisation is not a backup.

Read that again. In bold. Synchronisation is NOT a backup.

Because synchronisation – wait for it – synchronises everything: including you messing up some file and not realising it. Including you deleting some folder and not realising it. So you MUST also have backups.

Nobody said this was simple. Alright, backups. When you think of backups, think of the event that causes you to be glad that you had it. They get progressively worse. Let’s simplify and say you’re always at home, and not about to be globe-trotting like we are.

1: Your laptop falls in the bath

Bath, ocean. Laptop wet, laptop no good. In this scenario, you’re in your house, you have a new laptop, and you need to get working quickly. You want a local backup that you can restore from.[3]

(In my situation, I’d try the re-synchronising first; but let’s say you don’t have that option.)

Your operating system has software built-in: Time Machine for Mac, Windows Backup for the other one, and you Linux nerds can figure it out yourself.

You should probably just use this. Personally I also use Arq but we don’t need to go there. Different software, same result.

2: Your backup didn’t work

It is not your day. You got your backup drive, tried to restore to the new, dry laptop – and it failed.

Disk error. Can’t read. Backup error code FKU390093-B. Cosmic rays. Whatever: backups also fail.

Lucky you have a second backup on a different disk. This is why I use Arq: it makes it really easy to connect to another machine and to create a backup there. So I have one backup on this little external SSD, one on the server, and another on an old Synology. Multiple backups on multiple storage devices.

But we’re not finished.

3: The house is destroyed by a cyclone

So now everything’s gone. Laptops, servers, hard drives, the lot. Really really unlikely, but it happens.

This is what the cloud is for. Ironic, as it just wiped us out. Ha ha. I pay for a cloud backup service that I never hope to use. Literally, if I go my entire life and never ever have to restore from the service that costs me about a hundred bucks a year, I’d be happy.

But the day you do, you’ll be glad for it. So: use Backblaze. Just do. Now, go and sign up now.

3-2-1 backup strategy

This is the industry-standard way to do things:

  • Three copies of your data.
  • On two different media.
  • One copy off-site, i.e. cloud.

Backblaze, and NAS vs. DAS

We talked about NAS vs. DAS above for a reason. Backblaze is amazing: unlimited storage for ~$100/year. With a catch: it only includes DAS.

Backblaze will not back up your Synology for $100/year. It’s a miracle that they will back up your LaCie 16TB for that. So this is definitely a factor when deciding what to buy.

Oh yeah, that Synology

Because I already had a Synology – an old DS118 single-drive unit – I’m using it purely as a backup target. Both laptops and the server back up to it, using Arq.

This is probably overkill. If I didn’t already have this, I wouldn’t buy one for this role.

Review

Let’s review with a little diagram. I have no computer drawing skillz so here’s one I did on paper.

Now, yours won’t look anything like this. Don’t just copy me. But make sure that you have this mental model of your data. What blobs are there? Where are they? Which copies are complete vs. partial? Local vs. cloud? Synchronisation vs. backup?

And if you need any help, ask on the forum.

Tailscale

There’s a secret sauce here which I’ll mention briefly.

The server was in the cupboard in the kitchen, but since deciding to go on the move, it needed a new home. So it’s now at my mate Alex’s house in Melbourne. Thalex!

Ordinarily this would have broken all sorts of stuff and required complicated network reconfiguration. But I have Tailscale permanently turned on, on every device, so I had to do: exactly nothing.

I turned the server off, gave it to Alex, he took it and the LaCie and the Synology home with him, he turned them on, and everything just works like it did. Albeit a touch slower, as they’re now about 700kms away. Only about 40ms of network latency though, which is impressive.

I couldn’t recommend it more. Again, too much detail for this post, but let me know if you need help.


  1. Most sync services will rename one of the conflicted files, giving it a timestamp and putting the word ‘conflict’ in the filename. Then it’s up to you to merge your conflicting versions. You’ll never actually lose data. ↩︎

  2. You should financially support ‘free’ software that you depend on. Because nothing’s really free. As soon as we can afford to, I’ll be sponsoring Syncthing. ↩︎

  3. Computer terms. ‘Local’ = on this network; in this building. ‘Remote’ = not. ↩︎

2 Likes

If anyone wants what I think is the lowest-effort (not lowest price) 3-2-1 system:

I have a computer with a 4TB hard drive in it. Single drive (cheap). Organized by JD systems of course.
I pay monthly for a 5TB Storage Share from Hetzner. This is expensive. But gives me a NextCloud drive that can sync 1:1 with my local desktop.
Hetzner provides their own backup solution. As such, I treat Hetzner’s stored files as my “primary” and “secondary” copies, and my local file-system as “the offsite” copy.

All my phones, computers synchronize to the Hetzner service over NextCloud apps.

NextCloud has historically been extremely crap software when I administered it myself for my own purposes, but paying professionals to host it means that it runs smooth as butter. And that I get support when/if something goes wrong.

Isn’t World Backup Day coming up soon?

hmm, so I have a Hetzner server as a Syncthing peer at the moment … didn’t know about their storage offerings, looks interesting.

Neat idea. And $16/mo might sound a lot, but if I say my Mac + LaCie setup cost $2,500 … that’s 13 years. Certainly a viable option and one that doesn’t depend on you having a friend willing to host your gear!

If I’m being honest, no one is hosting backups without a nice computer setup and data worth preserving anyway. The price of the computer is awash because anyway my local copy is sitting on a computer.

It’s the price of electricity for an always on computer + a hard drive replacement every 5 years + a backup solution to something like backblaze that now gets rolled into the monthly storage share price. I’m probably overpaying for those things by about $5-6 a month compared to rolling my own solution at a friends or family members house.

I chose Hetzner for three reasons:

  1. They own the land under their servers so price hikes are limited, and they’re just a company that is in control of their processes.
  2. I host my data in Germany where extremely strong data protection and ownership laws exist.
  3. If I ever want to go fully in the cloud, renting a Hetzner server for my application serving needs would immediately benefit from having my data located in the same data center already.

oh yes, I know Hetzner well – my first task at my first job was to duplicate a map server stack on a new Hetzner metal server.

Back in the day, we’d feel almost reluctant devulge their name to our peers at conferences[1] because Hetzner felt like this secret weapon that no one knew about yet.


  1. In Dutch there’s this nice word ‘concullega’ for competitor-colleague. ↩︎

1 Like

Interesting things and thoughts, thanks for sharing.
My main issue with offsite backups is that I have quite some data (around 16 TB and counting) which makes it a bit tricky. I found hosted storage in that dimensions ludicrously expensive.
Until recently we had a bank near by that offered lockers which was a nice solution like having a hard drive at the bank and “update it” once a month. It was expensive but still cheaper (and faster) than hosted storage. However they closed down and I’m thinking that just building a bit of hardware and give it to someone I trust with a good network connection is maybe the best option. So I could put the initial load already on the machine and just sync the incremental backups over the wire.
Or give in to global surveillance and just ask the NSA for my data if the hard drive dies? :wink:

Hetzner has come a long way. I remember times when I sadly had to recommend to not touch their stuff even with a ten foot pole. But they managed to solve their problems and have grown into a reliable and good hosting provider. I wish they would have kept first class BSD support though.

Sorry I knew you knew, except I wanted to respond to you with point #3; thought I’d write down #1 and #2 for passers-by.

I think beyond ~10TB, backing up all devices for a family/friends situation to a single Windows computer and just using Backblaze unlimited personal backup is the winning move. The price does change (has basically doubled in the last 5 years), but unlimited is unlimited with them.

The backup to that Windows computer or VM could be done with anything that does de-dup (I’m a fan of Duplicacy).

1 Like

This was a very nice read. Thanks.

1 Like

Hey, I’m the guy who wrote the email to Johnny. (Thanks again, Johnny!)

I love the idea of a Mac mini “server”. I appreciate the clarification regarding why lowest spec (and lower price) is fine for this purpose.

The article clears up my confusion and ignorance regarding NAS vs DAS. I haven’t heard of “Synology” or “QNAP” before, so it’s probably best that I stick to DAS for now.

RAID (I’m know about this one) might not count as a true backup, but it still seems like a good idea to me to have that redundancy.

Syncthing really is awesome. I’ve used it before to turn a spare computer into a personal “sever”, similar to how you’ve described, except you’re clearly an expert at this. That said, software like Syncthing existing for free is a really great thing and worth supporting.

I agree that Johnny.Decimal makes achieving this kind of system a lot easier in terms of identifying which chunks of data are where, which was perhaps the biggest draw to the JD system for me (i.e., being able to organize across all digital and physical locations).

The point about backups and all data always being on the server is the key for me. Files on my computer were getting missed by my backup system (aside from occasional Time Machine backups, which to many — myself included — is not a backup at all). But I need to have a lot of files on my computer at one time in order to do my work effectively, and trying to keep all this synced and backed up manually was just too much work.

The diagram really clarifies how the whole system is put together.

I like the approach @clappingcactus takes to 3-2-1 backups. It does sound expensive, but Johnny’s math checks out ($16 x 12 months x 13 years = $2496, $2496 < $2500) — viable indeed (unless the rate/month changes — not sure what Hetzner’s business practices are like, but who can predict 13 years from now anyways). Very intriguing option. My only problem with implementing this approach is that, similar to Johnny, I’m working off a laptop with limited storage (also 500gb I think). So I need external drives, which complicates my system. A separate server with DAS sounds more compatible with my current process/system.

Really appreciate the comments. Thanks all!

As with most things storage-related, this is a trade-off. I chose RAID0, which is just use both disks. Because I bought 16TB and I needed to use 16TB.

Mine’s only a 2-bay unit so my other option was RAID1, which would have mirrored my data to both disks. At the cost of storage space, of course: 8TB usable. I couldn’t do that.

In my case, I’m not using this storage as my primary volume.[1] So if it goes down, I can suffer the inconvenience.

Contrast with my bestie, a photographer. She has a 5-bay QNAP. One of those disks is redundant (RAID5), because if she loses a disk, having the entire array be down is unacceptable: it’s where all of her work is, and she wouldn’t be able to deliver client jobs.


  1. Actually I am for my Plex server, but if it’d failed we would have just done something else other than watching TV until I got it back up. ↩︎