How-To: Configure and Schedule Automated Backups in Linux
Step 3: Cron
Once you have configured Rsync to backup your files, you are only halfway to having a viable backup plan. A decent backup solution must run regularly instead of intermittently, and all by itself Rsync will not update your files unless you manually invoke it. While you can remember to manually run Rsync every day, there is a far easier way to do it.
Linux and similar systems have a utility called Cron, which is essentially a scheduling tool for running other programs. Each user has a crontab file, which is a list of instructions for Cron to execute and the times each instruction should run. In this way, everyone (not just root) can use Cron. Cron works with the system clock; when the correct time for a planned event rolls around, Cron will automatically execute the command.
There are several ways to edit your crontab. The easiest way for new users to configure Cron is to use a frontend like gcrontab or kcron. More advanced users can edit the crontab manually in a text editor like Vim or Kate. To edit the crontab manually, open a terminal and type “crontab -e”. After that, you should check your system process list for a “cron” or “crond” process (root should own it) to make sure that the Cron daemon is running. After editing your crontab, you must restart the cron process ( run “sudo /etc/init.d/cron restart”) before your new changes will work.
Manual crontab editing looks daunting at first but is simple once you get the hang of it. Each row in the crontab list is treated as a separate command. Each row has several columns that must be specified: minute, (abbreviated “m”) hour, (abbreviated “h”) day of month, (abbreviated “dom”) month, (abbreviated “mon”) day of week, (abbreviated “dow”) and the command. Each column is delimited by a single space with no other marks, and it doesn't matter if each row lines up perfectly with the others or not. You can add times/dates as both real numbers (Cron uses a strange 24-hour clock, so noon is 12:00 and midnight is 00:00), abbreviated days of the week (Sun, Thu, etc.), and wild characters. (*) Anything defined with a wild character is interpreted by Cron as “all”, meaning that if the hour on a command is set to “*”, Cron will execute the command every hour. To configure something to run repeatedly at a certain interval, you can use a “split” wild character. (e.g. setting */2 in the hour field will cause the command to run once every two hours on the days you define.)
Cron is quite flexible; Ranges affecting everything between two values are defined by short dashes (-) while multiple nonconsecutive occasions are delimited by commas. For instance, if you wanted to run a command every day from the first of the month to the 10th, you would specify “1-10” in the “dom” field. Likewise, if you wanted a command to run every Monday, Wednesday, and Friday, you should declare “Mon,Wed,Fri” in the “dow” field.
Step 4: Put them Together
Now that you know about Rsync and Cron, you can probably already see how they can be used together to automate backup processes. Since decent backup procedure recommends backing up to multiple sources, you would have to create multiple crontabs each with a different rsync command. Fortunately, there is a far better way that can be handled with a single Cron job.
The various command shells on Linux (like Bash) have excellent support for scripting. Shell scripts are the Linux equivalent of Windows batch files and offer a way to quickly run multiple commands in a specific pre-defined order and with a preset configuration. If you have much experience at all on the command line, you should not find basic scripting very difficult (there are plenty of online guides to help you write scripts for bash and other shells if you run into trouble). Creating a shell script to hold the necessary Rsync commands is trivial; from that point, you can invoke the shell script in your crontab and each Rsync command will run at the appointed time you set in Cron just as though they were being invoked directly. It helps to specify the full pathname of the shell “/bin/bash $scriptname” in the crontab instead of the shortcut “./$scriptname” to make sure that Cron executes the job successfully. If you have sensitive data, you should definitely consider encrypting it before you place it on a shared server (like a web host). Encryption can be done with GPG in the shell script prior to transmission.
The only foreseeable problem with automated backup is that system configurations tend to change over time. Mount points and IP addresses can be reassigned without notice, and your script will not automatically update itself to include these changes. As long as your Rsync commands are out of date, your files will not be backed up properly and you will have no idea of the problem until it is too late. Therefore, it pays to manually run your backup commands often to check for problems and update your script as necessary.
Although Rsync can sync between locations on the local system without user input, it usually requires a password before it can sync to a remote system (it uses part of SSH's functionality for this). Since automated Cron jobs prevent user input, there is no way to provide the password when it is needed. Because of this, the default SSH behavior will not work for automated backups. You can get around this by setting up public/private key authentication for SSH; as long as the keys match, no password is required. Be forewarned that the automated key authentication will be broken if the IP address of the remote host changes (this is a deliberate feature to help prevent man-in-the-middle attacks) and that security is compromised slightly. Ultimately, it comes down to a trade-off between security and convenience, so choose wisely based on your situation and needs.