Quantcast

Don't have an account? Register Now! Forgot password?

Maximum IT
How-Tos

How-To: Configure and Schedule Automated Backups in Linux

comment Commentsprint Printemail EmailDeliciousDiggStumbleUponRedditFacebookSlashdot

Regardless of the operating system you are using, data loss is inevitable. Sooner or later, it will happen to you—the only question is how much data you will lose. Although RAID can act as an insurance policy for hardware failures, it was never designed to serve as a backup and will not perform this task well at all. Human error is always the greatest concern since important files can be accidentally overwritten or deleted at careless moments. It is easy to fall behind on your backups or get complacent; without recent backups you have no recovery strategy. This guide will help you automate your backups on your Linux rig so you will always have your files up to date.

Step 1: Get the Requisite Tools

Before you can backup your data, you need an acceptable storage location to copy it to. Optical media like CD-R/RW or DVD-R/RW discs were once a popular (but not necessarily the best) medium to back up to since they held a lot of data for the time and were fairly cheap. Cheap optical media is suitable for short-term storage, but should not be relied upon for the long-term because of the possibility of scratches, oxidation, or organic dye breakdown. (CD rot) Optical media is now even less practical than it used to be since most personal data greatly exceeds what most disc formats can hold. It would take many discs (or one or more discs in a still-expensive format like Blu-ray) to conduct a single backup session. It used to be common practice to include multiple redundant copies of a file on a disc or spread across several discs to improve the chances of recovery in case of damage, and this would inflate the disc count even more. Ultimately, it just isn't worth using optical discs for backup anymore.

Today, the only practical means for backup is either an external hard drive (or several of them, if you want maximum protection) or an external server. It is best to rely on a combination of these methods instead of just one to increase redundancy. In any case, the hard drive(s) should be large enough to accommodate both your current existing data in addition to any foreseeable growth. In the case of servers, you should definitely use a remote server if you have access to one (if you buy web hosting and have plenty of space left on your account, that would be ideal for backups) Regardless of the storage mechanisms you use, the actual file transfer operations should be done with a program called Rsync.

Step 2: Set Up Rsync

Rsync is a program that copies data from one location to another. Although another program, cp, exists for this purpose, Rsync is far more advanced and efficient; while cp can only copy entire files from one location to another on a local system, Rsync compares the source file to the destination file (if it exists) and only copies the newer parts of the source file to the destination. In this way, Rsync can synchronize data between two locations much like the Windows briefcase tool does. This saves an immense amount of time and bandwidth on backup procedures.

In addition to that, Rsync can sync files on both local and remote systems whereas cp can only work with local systems. (there is a remote version of cp called scp, but even it can only work with whole files) Rsync will be as slow as cp the first time you use it since the destination files must be copied in full to the new backup location, but subsequent sessions will be much faster. You should know that it may take anywhere from several hours to several days to complete the first Rsync session with a remote server, depending on the speed of your connection and the amount of bulk data you need to transfer. Furthermore, Rsync (via SSH) encrypts remote file transfer sessions to keep your data from being sniffed in transit.

Rsync is fairly straightforward. The basic syntax is as follows: “rsync -a [source dir] [destination dir]”. (the -a switch tells rsync to work in “archive” mode, which is ideal for backup functions) Although the basic command listed above will work once you specify the source and destination locations, there are many other options available to tweak Rsync. These can be discovered by reading the Rsync manual page (run “man rsync”).

Although the command line implementation of Rsync allows for easier automation, (more on that next) using Rsync in this way can be difficult for new users who are not used to the command line. In such cases there is a graphical frontend called Grsync that can vastly simplify the backup process. Grsync redefines the various switches as easy-to-understand checkboxes that can be set to the desired combination.

Grsync does have a degree of automation by allowing you to define and automatically run a session (grsync -e [session_name]) but standard Rsync is still much more versatile since you can specify commands directly instead of having to rely on predefined sessions.

COMMENTS:2
COMMENTS
avatarVerification

Very nice simple guide, thank you. However, instead of occasionally manually running the bash script you can just have it email you if it fails.

 rsync -a [source] [destination] || mail .....

 I can't remember the syntax for the mail command right now but the double-pipes will ensure that the second command only runs if the first one fails (the 'or' logic).

 Also, running linux on Optimus Prime? Very nice. :)

Login or register to post comments
avatarThe Mail Command

mail somebody@maximumpc.com < file.txt This will cat the contents of the file and e-mail it to the address provided.

I think that what you would use for your statement would just be

  rsync -a [source] [destination] || mail youraddress@yourdomain.com

Login or register to post comments

This Month's Issue
FEATURE How to Get FREE Programs, Services, Software & MoreFEATURE Digital Photo Printer RoundupHOW TOBuild a 3D CameraFEATUREDIY Arcade PCWHITE PAPERHow TRIM Works