How-To: Configure and Schedule Automated Backups in Linux
Regardless of the operating system you are using, data loss is inevitable. Sooner or later, it will happen to you—the only question is how much data you will lose. Although RAID can act as an insurance policy for hardware failures, it was never designed to serve as a backup and will not perform this task well at all. Human error is always the greatest concern since important files can be accidentally overwritten or deleted at careless moments. It is easy to fall behind on your backups or get complacent; without recent backups you have no recovery strategy. This guide will help you automate your backups on your Linux rig so you will always have your files up to date.
Step 1: Get the Requisite Tools
Before you can backup your data, you need an acceptable storage location to copy it to. Optical media like CD-R/RW or DVD-R/RW discs were once a popular (but not necessarily the best) medium to back up to since they held a lot of data for the time and were fairly cheap. Cheap optical media is suitable for short-term storage, but should not be relied upon for the long-term because of the possibility of scratches, oxidation, or organic dye breakdown. (CD rot) Optical media is now even less practical than it used to be since most personal data greatly exceeds what most disc formats can hold. It would take many discs (or one or more discs in a still-expensive format like Blu-ray) to conduct a single backup session. It used to be common practice to include multiple redundant copies of a file on a disc or spread across several discs to improve the chances of recovery in case of damage, and this would inflate the disc count even more. Ultimately, it just isn't worth using optical discs for backup anymore.
Today, the only practical means for backup is either an external hard drive (or several of them, if you want maximum protection) or an external server. It is best to rely on a combination of these methods instead of just one to increase redundancy. In any case, the hard drive(s) should be large enough to accommodate both your current existing data in addition to any foreseeable growth. In the case of servers, you should definitely use a remote server if you have access to one (if you buy web hosting and have plenty of space left on your account, that would be ideal for backups) Regardless of the storage mechanisms you use, the actual file transfer operations should be done with a program called Rsync.
Step 2: Set Up Rsync
Rsync is a program that copies data from one location to another. Although another program, cp, exists for this purpose, Rsync is far more advanced and efficient; while cp can only copy entire files from one location to another on a local system, Rsync compares the source file to the destination file (if it exists) and only copies the newer parts of the source file to the destination. In this way, Rsync can synchronize data between two locations much like the Windows briefcase tool does. This saves an immense amount of time and bandwidth on backup procedures.
In addition to that, Rsync can sync files on both local and remote systems whereas cp can only work with local systems. (there is a remote version of cp called scp, but even it can only work with whole files) Rsync will be as slow as cp the first time you use it since the destination files must be copied in full to the new backup location, but subsequent sessions will be much faster. You should know that it may take anywhere from several hours to several days to complete the first Rsync session with a remote server, depending on the speed of your connection and the amount of bulk data you need to transfer. Furthermore, Rsync (via SSH) encrypts remote file transfer sessions to keep your data from being sniffed in transit.
Rsync is fairly straightforward. The basic syntax is as follows: “rsync -a [source dir] [destination dir]”. (the -a switch tells rsync to work in “archive” mode, which is ideal for backup functions) Although the basic command listed above will work once you specify the source and destination locations, there are many other options available to tweak Rsync. These can be discovered by reading the Rsync manual page (run “man rsync”).
Although the command line implementation of Rsync allows for easier automation, (more on that next) using Rsync in this way can be difficult for new users who are not used to the command line. In such cases there is a graphical frontend called Grsync that can vastly simplify the backup process. Grsync redefines the various switches as easy-to-understand checkboxes that can be set to the desired combination.
Grsync does have a degree of automation by allowing you to define and automatically run a session (grsync -e [session_name]) but standard Rsync is still much more versatile since you can specify commands directly instead of having to rely on predefined sessions.