Malcolm Heath
Malcolm is an expert systems administrator with over a decade of experience in systems and network design, information security, and building scalable, comprehensive systems, Malcolm has a proven track record of developing solutions to fit any size environment, ranging from small offices to Fortune 50 corporations, often on tight budgets and schedules. Read more...
It's 4am, do you know where your data is?
Malcolm Heath | February 5, 2010This morning, I got to be a hero. This happens occasionally in the life of a systems administrator, and I think it would be lying to say that I don't enjoy it. So much of the work that folks like me do is behind the scenes, unglamourous, and doesn't make the headlines.
I'll not cast blame here - a confusing set of circumstances conspired to have one of our developers request that we roll back to a previous version of a database. Changes had been made, but shouldn't have been, and things were a mess. I suppose the closest analogy might be someone disassembling their car engine, and then realizing, at some point, that they didn't quite know how it all fit together again, but with a nagging suspicion that those extra parts were somehow important.
And, as it turns out, earlier in the week, I did a data recovery for another co-worker, who had important data stored on a device, and only on that device, when the device suddenly stopped working. I was able to get the data back, and there was much rejoicing, but again, backups would have made this a much less scary time for all of us.
So, obviously, backups are important. You probably already know this. But managing backups is a difficult task, and one that is often neglected. The rest of this post are a few guidelines and questions to ask yourself, and a review of some more recent tools that might be of use to your organization when it comes to this difficult and often neglected topic.
Point 1: You need backups, and you need multiple backups. This should be clear from the above. Drives fail, people make mistakes. Files are deleted. The reason you need more than one is because, believe it or not, failures often come in groups - your hard drive fails, and for whatever reason, the automatic backup didn't run correctly.
Point 2: A backup that isn't automated isn't really a backup. If you think that you'll remember to copy that important file to the fileserver each time you make a change, you're lying to yourself. Humans aren't good at remembering to do things the same way every single time. We have bigger things to worry about, and, after all, doing the same thing reliably is what computers are for. Let them do what they're good at.
Point 3: Traditional backup technologies don't work as well any more. With the increasing use of mobile devices, it's getting harder and harder to automate backups in the traditional manner, which has usually meant backing up files on a central file server in the middle of the night. When most people work on their laptops, and often do so from home or a cafe or a train, there's no way that the central back up processes can get to it.
Point 4: More data means slower backups. It was one thing to back up important documents over a network in the office when those files were each less than 200k. Now, however, it's likely that you regularly work with multi-megabyte files, or even larger if you use multimedia. My rather small collection of documents takes up 31 GB on disk, and many of my users have even larger collections. At 100mbits/sec (i.e. wirespeed) that would take more than 40 minutes to back up, and that's a theoretical maximum, and doesn't take into account network overhead, usage, or disk and processing speeds.
So, what is the right solution? It turns out there are many, depending on what you want to accomplish. Much of this becomes more obvious when you start to think about what the risks are, and what an acceptable level of loss is for you.
What I mean is this: if you are worried about being able to come back up quickly if your laptop is run over by a car, you need to have a full system backup, which in my case would be something on the order of 100GB. That would take at least a few hours each day to back up. I could probably trim this a bit if I didn't back up the operating system and standard applications. But still, a lot of data. When backups are running on the machine, performance will be slowed. So, perhaps we will decide that having that happen only once a week is an OK thing. That means that the worst case scenario is that my laptop is destroyed, and I've lost a weeks worth of work.
I could also decide to run a full backup every week, and then only back up files that have changed since that full backup daily. This doesn't add too much more data, but does give me only a day of loss.
But what if during that one day, I produced something really, really critical, something that was irreplacable, or needed for a deadline. This could be disastrous. So, I would still need to make sure I had multiple copies of truly critical data, saved as soon as I finished work on them (actually, saved at various stages along the way is really what we need). That is harder, and often relies on the user remembering to save the work.
And on it goes. It only gets more complex when you're dealing with databases, or large fileservers. And of course, your backups need to be backed up as well, and checked to make sure they work, and ideally also have copies off site, in case your building burns down.
You could go out and spend $100,000s on big tape libraries and networked backup software for all your clients and servers. This is a traditional approach, and has it's benefits, but the costs are high. Tapes (and for that matter, all media) don't always work, and don't always last very long, and most places that have such systems also have dedicated staff to manage them, another cost.
However, there are some cheaper options. For the cost of an external drive of a large capacity, and the use of either a builtin tool like Apple's TimeMachine, or the various Windows backup tools, you can at least make sure you have a copy of your machine.
For backing up individual files, you could look at a network fileshare, or a tool like Dropbox, which is meant for file sharing, but can be used to keep an "offsite" copy of your data somewhere out there on the internet. This of course opens up a can of worms regarding data security and access. You need to make sure that your data is not going to get accidentally shared, if, for example, the data in question is your salary list, or sensitive personal records.
And then there's what I think will become increasingly common, with the caveats about data security just mentioned - there are a plethora of "online backup solutions" which will back up your data opportunistically, as you work through the day, as long as there is a network connection available. These services can be costly for a large amount of data, but most have lower levels for less money, adequate for backing up a users main documents, and having the added advantage of paying to have someone else worry about the integrity of the backups. You can also get them from anywhere - even if your building burns down, all you need is a computer, a login, and a network connection, and you're back in business.
It's complex, and there's a lot of ramifications to any backup approach, and associated risks. It's worth thinking long and hard about, and reviewing regularly. Because, although you might not need them for months and months, or even years, eventually, you will need your backups, and you have to be able to trust that they'll be there.




Post new comment