Date: October 4, 2013
Author(s): Rob Williams
Keeping good backups of your data is important; don’t be the sucker who loses important files and has to deal with it afterwards! In this in-depth guide, you’ll learn about using rsync and lftp to transfer files, writing your own scripts and automating them, and backing up to external storage, a NAS, and a remote server without passwords.
I hate to admit it, but I’ve screwed up many times throughout the years when it’s come to using computers. My first big “Oh shi!” moment came when I was about 8-years-old, goofing around on my family’s trusty 286 with monochrome monitor. DOS was a new world to me, and I quickly found out that format c: wasn’t the proper command for deleting the contents off of a floppy disk. What a depressing day that was.
I’ve learned much since that experience, but even in recent memory I can recall careless mistakes which have led to lost files. But, no more. I’ve made it a goal to keep perfect backups of my data so that I don’t suffer such a fate again, and I highly recommend everyone reading this article do the same. It’s one of the worst feelings in the world when you realize you’ve lost an important file that can’t be brought back, so doing your best to prevent that seems smart.
This article aims to help you out by making use of a glorious piece of Unix software called ‘rsync’. You have to be willing to get your hands a little dirty, however, as this involves command-line usage. That’s part of the beauty here though – with simple commands and scripts, you’ll have complete control over your backup scheme. Best of all, your scripts can run in the background at regular intervals, and are hidden from view.
Seeing double is great when it comes to your data
It’s important to note that this article will not be focusing on backing up an entire system (as in, OS included), but rather straightforward data. Also, because a command-line interface is not for everyone, we’re considering dedicating a future article to taking a look at GUI options (please let us know if that’d be of interest!)
There are many different types of media that you can backup to, but we’re going to take a look at the three most popular and go through the entire setup process for each: Removable storage, network-attached storage (NAS), and a remote server also running Linux.
On this first page, we’re going to delve into the basics of rsync, giving examples for you to edit and test out.
“Why not rsync?” might be the better question. rsync is a file synchronization application that will match the target with a source, with proper permissions retained. The beauty is that its usage is not simply limited to the PC you are on, but it can also access remote servers via RSH or SSH.
As an example, let’s use the scenario where you want to backup your top-secret documents folder to a flash drive. When given the “OK”, rsync will copy all of the files to your flash drive, making sure to retain the correct permissions while verifying that all of the files match. If you update a file and run rsync again, it will update that file to the storage you specify – simple as that. Though it’s not entirely user-friendly given its command-line nature, once you gain the basic knowledge of how it works, you will truly wonder how you went so long without using it.
Am I missing something that’s worth mentioning in this tutorial? By all means let me know and I can consider an inclusion. Most of what’s being tackled here is derived from a personal need, so I might not be including all worthy scenarios.
As mentioned above, rsync is a very powerful application and we can’t exhaust all of the possibilities with this article. We will touch on the basics however, with the hope that you will finish reading and find yourself more knowledgeable overall, and raring to go. Once you set up your cool backup scheme, it will effectively mimic commercial GUI’d software; while rsync isn’t pretty to look at, it’s seriously effective, and it’s free.
rsync can even be used on Windows through Cygwin
It’s worth noting that rsync can also be used in Windows, via the free terminal application Cygwin, which brings a Unix-like CLI environment to Microsoft’s OS. The rsync package could be installed during the initial setup of Cygwin, or after-the-fact via its package manager. Once the Windows folders are mounted inside of Cygwin, rsync can be used just as it can be under a real Unix/Linux OS.
Run-down of Common rsync Switches & Processes
When running an rsync command, the first thing that should come after rsync is the desired switch or switches. Common switches include:
You can add multiple switches to the same rsync command; eg: rsync -avc.
Following the switches are the source and target folders; “source” is where the original files are located, and “target” is the folder where the files are going to be copied to:
rsync –av /source/ /target/
eg: rsync -av /home/username/ /mnt/Backup/
Copy Data from One Folder to Another
rsync –r /home/techgage/folder1/ /home/techgage/folder2/
This copies the entire contents of one folder (including other folders) to another, without much care to preserving permissions. To make an exact duplicate, the -a switch can be used instead (it encompasses -r and other important switches):
rsync –a /home/techgage/folder1/ /home/techgage/folder2/
To retain all owners (+usergroups) correctly, it might be best to do all of this in superuser (root) mode.
To increase the verbosity of the output, add a -v switch; this will show you each file as it’s being transferred and give interesting bits of info such as overall speed and total size at the end.
rsync –av /home/techgage/folder1/ /home/techgage/folder2/
Removing Old Files from the Target to Keep a Perfect Sync
Many who backup a folder or sets of folders want both the target and source to be identical. However, rsync’s default method is simply to overwrite what’s there with newer data, ignoring the fact that there might be files on the target that no longer exist on the source. An example: You keep a regular backup of your home folder, but regularly have scrap files on the desktop. If the sync occurs while these scrap files are on the desktop, then they get copied to the target and are left there, unless you specifically tell rsync to delete files on the target that no longer exist on the source.
rsync –av ––delete /home/techgage/folder1/ /home/techgage/folder2/
If you’re new to rsync, you might be thinking, “Is that all there is to it?” The answer is yes. Though all of these examples look (and are) simple, there are many things you could pull off without even leaving your desk. Imagine backing up a Windows location mounted on your PC to a local or server location:
rsync –av ––delete /mnt/momPC/Users/ /mnt/NAS/Backups/momPC/Users/
The above example would assume that each location is properly mounted via fstab, a technique we tackle on page three of this article.
What about backing up your server or another Linux machine in your house? Time to bring SSH into the picture. This assumes that each computer has SSH installed and that the source machine has an SSH server running. If SSH is not running on one machine, it should be as easy as running /etc/init.d/sshd start as root (or sudo).
rsync –av ––delete -e ssh [email protected]:/home/techgage/ /mnt/NAS/Backups/MainPC/techgage/
rsync –av ––delete -e ssh root@targetipaddress:/remotefolder/ /localfolder/
SSH in itself offers a lot of potential. If you have multiple servers and need to duplicate a folder from one to the other… just SSH in and run your rsync script. The possibilities are not only endless, they’re exciting.
This is about as much of rsync as we will tackle, but don’t hesitate to check out man rsync for a full options list and other examples. That’s not as far as our examples go though, as we have many scripts on the last page of this article. For now, let’s move into making use of external storage for backup purposes.
On this page, we will be tackling the basics of mounting your external storage device. This could be a flash-based device or an external hard-drive, such as one in an enclosure. If you know the basics of disk mounting, then you can bypass this section entirely.
Note: Is the storage device in question operational once it’s plugged into the PC? You can skip on down to the “Editing fstab for Persistent Mounts” section. If you have the file system support you need but your distro doesn’t automatically mount your storage device, you should head instead to the “Creating a Folder and Mounting External Storage to It” section.
Checking for File System Support: The easiest way to see if your distro currently supports the file system you need is to open up a terminal, enter su (or sudo), and then type in cat /proc/filesystems, as seen below:
techgage rwilliams # cat /proc/filesystems
Note that certain file systems (eg: exFAT and NTFS with write capabilities) require FUSE (Filesystem in Userspace) support to function (if ‘fuse’ is listed in your /proc/filesystems query as seen above, you’re golden). If FUSE or some other file system isn’t configured for your current kernel, you’ll need to tackle that by recompiling your kernel.
Adding File System Support to Your Kernel: The process of configuring and compiling a kernel under Linux isn’t for the faint of heart, and it’s a subject that would require its own article to explain. If you do need to take this route, and aren’t familiar with updating a kernel, you’ll need to refer to your distro’s documentation.
FUSE can be found under the “File systems” section, typically near the bottom, and is called “FUSE (Filesystem in Userspace) support”. FAT, if it somehow isn’t configured (it’s a kernel default), can be found inside of the “DOS/FAT/NT Filesystems” sub-section.
Creating a Folder and Mounting External Storage to It
Most distros make things easy by automatically mounting your external storage once it’s plugged in, but there’s a potential downside to that: It might create a filepath that differs from time to time (especially if you have two drives plugged in that share a label; eg: “Backups”). Thus, the remainder of this page is dedicated to making sure you’re mounting the drive to the location you want, while also making sure that it’s mounted there every single time you plug it in, regardless of what other storage is connected (even those with a duplicate label).
First things first: Figuring out the drive letter for your external drive. Each storage device plugged in receives a drive letter; IDE will be /dev/hdX; SATA will be /dev/sdX. If your primary hard drive or SSD is of the SATA variety, it will receive the /dev/sda drive-letter. A second similar drive would be /dev/sdb, and so forth. If your system has only two SATA drives installed, and you plug in external storage, it will take on the form of /dev/sdc. To help with the drive letter identification process, a GUI tool like GParted could be called-upon, as it provides important information such as partitions, storage space used, and et cetera.
With this knowledge in hand, mounting a drive should be a simple matter of (assuming it’s a FAT32 drive):
techgage@localhost ~ $ su (or sudo -s)
localhost techgage # mkdir /mnt/thumb
localhost techgage # mount -t vfat /dev/sdc1 /mnt/thumb
The ‘mkdir’ command creates a folder under /mnt, and while I called mine ‘thumb’, it can be named anything you want; it’s usually best to make it something relevant, however, such as ‘storage’ or ‘exthdd’.
If the external device (or internal, for that matter) happens to be NTFS, it can be mounted using ntfs-3g:
techgage@localhost ~ $ su (or sudo -s)
localhost techgage # mkdir /mnt/ntfs
localhost techgage # mount -t ntfs-3g /dev/sdc2 /mnt/ntfs
If when attempting to mount using ntfs-3g, it doesn’t work, you’ll need to verify that the required driver is installed. One way to do this is to open up a terminal, enter su (or sudo -s) and then type in ‘mount’ without quotes and then hit . This will show you all of the file systems that your distro supports which aren’t built into the kernel. If ntfs-3g isn’t listed, you will have to search your distro’s repository for it (eg: : Under Gentoo, I had to install it by running emerge ntfs3g as root).
Editing fstab for Persistent Mounts
To make sure a removable storage device mounts to the exact same location each time it’s plugged in, the /etc/fstab file will need to be edited. This is the file that the OS refers to when a new storage device is detected, and without a manual entry, it could be mounted to an unexpected location, rendering a backup script ineffective.
The first step: Establish the storage device’s UUID (unique identifier). After plugging the device in, you can either use GParted’s “Information” option to find the UUID of a partition, or run ls -l /dev/disk/by-uuid/ to find it quicker (by looking for the known /dev/sdX value).
After running that latter command, I could see “6634EE1034EDE2D3 -> ../../sdd1″, which I knew for a fact was the correct drive. So, editing /etc/fstab, I added this line:
UUID=6634EE1034EDE2D3 /mnt/exthdd ntfs-3g defaults,locale=en_US,utf8 0 0
This makes sure that each time I plug the drive in, it mounts to the exact same location (in this case: /mnt/exthdd) each and every time. No more wondering!
As a network device, a NAS needs to be mounted to the local machine in order for it to be used like normal storage. Most of what’s being discussed on this page could apply to server shares in general, but we focus on NAS since it’s widely-targeted by those who want to keep safe backups off of their main PC.
Because a NAS is in effect a computer, a vendor could avail you different methods of being able to interface with it. For example, if the NAS happens to support SSH, you could backup to it without mounting a share first. I’d imagine, however, that most who backup to their NAS probably also want the ability to access that storage without having to navigate the network protocols in a file manager, so mounting the share as local storage is probably what you’re looking for.
In order to mount a NAS (or network drive in general), CIFS support is required (a Samba successor). Chances are good that support is built into your distro, but if you’re taking the DIY route, it can be found in the kernel options under “Network File Systems”.
Even if CIFS is configured in the kernel, utilities are needed to actually mount a network share. Thus, you’ll want to search your repo for ‘cifs-utils’, and if it’s not installed, install it (eg: apt-get install cifs-utils). Afterwards, you’ll have ‘mount.cifs’ available on your system.
At this point, the important bits are prepared, and the time has come to mount a network share. This is where things can become a little confusing, however. Over the years, I’ve encounted NASes that like to be mounted via IP, and then others that require being mounted via the general network filepath. If you try to mount one way and it doesn’t work, try the other.
If you’re unable to mount via share name, and don’t happen to know the IP address for your NAS, you can have a look in your router’s DHCP lease section, which will look like this:
Alternatively, if nmap is installed, you could run nmap -sP 192.168.0.0/24 (or 192.168.1.0, depending on the router’s internal IP address) and get it that way:
Finding the proper network share name is unfortunately a little complicated under Linux, but if the ‘smbtree’ terminal application happens to be installed (can be run as a normal user), that might make things easy. Searching through the network section of your file manager could help as well. What you’ll be looking for is effectively a share name like: \\NAS\Share, or as a real example: \\tg_nas\Storage\.
That being said: Even if the share comes up as \\tg_nas\Storage, Linux is unable to mount it exactly as it’s stated. Instead, the forward slashes need to be changed to backslashes; eg: mount -t cifs -o guest //tg_nas/Storage /mnt/nas, or mount -t cifs -o guest //192.168.1.100 /mnt/nas if the IP address needs to be used.
The -o switch allows us to enter a username and password, or just ‘guest’ if no password is needed.
Here’s the complete mounting process in action:
Other mounting examples:
mount -t cifs -o username=username,password=password //NASname/Sharename /mnt/nas
mount -t cifs -o username=username,password=password //192.168.1.100/Sharename /mnt/nas
To have the NAS share automount at boottime, you can add an entry to your /etc/fstab file. Here are a couple of examples:
//tg_nas/Storage /mnt/nas smbfs username=admin,password=techgage 0 0
//192.168.1.100/Storage /mnt/nas cifs guest,uid=1000,iocharset=utf8 0 0
//192.168.1.100/Storage /mnt/nas cifs username=username,password=password,uid=1000 0 0
Because everyone’s configuration is different, it’s impossible to cover all of the bases here. You might need to specify a domain when mounting (domain=domainname), for example, mount with a different character set or adjust the users that have should have access to it. If you run into issues, I’d recommend hitting Google and searching for ‘mount cifs (distro name)’.
Run a Linux-based Web server? If so, your backup possibilities have increased dramatically. Depending on whether you are copying to or grabbing from the server, there are two applications I love to use: rsync, as you have likely already guessed, and LFTP, a feature-rich command-line FTP application.
Let’s delve into our LFTP usage first, since our rsync techniques will require a little more time. First things first: Make sure lftp is installed by typing ‘lftp’ in a terminal. If it’s not, install it either from your distro’s repository, or via the official source code.
Suppose you are the head of a popular open-source project and would like to upload nightly builds of your source code. The easiest way would be first creating an LFTP script, followed by a shell script and adding it to your crontab. Let’s tackle the first challenge first.
If you have a directory full of nightly builds (which could be built with our scripts on the next page), you can use LFTP to automate the process of uploading them. First, you would want to create a simple script that LFTP understands, which are thankfully similar to shell scripts. I highly recommend saving this under a root account because your password will be clear as day otherwise (unless you happen to live in a fog-belt, whereas that would be a bad example).
lftp open -u username,password -p 21 domain.com ;
lftp mput -O public_html/nightly /home/techgage/projects/appname/nightly/*.tar.gz
Running this script with lftp -f lftp_script would proceed to upload all *.tar.gz files found in that directory. If the tarball exists on the server already, it will not be overwritten; that way, it will only upload whatever is new, which is likely what you want. On the server-side, you could create a script which would symlink the new file to say, latest.tar.gz, so that people who grab that file always know they are getting the latest build.
Want more than just one file uploaded? How about an entire folder? Let’s move onto the mirror command with our sample script:
open -u username,password -p 21 domain.com ;
mirror -R /home/techgage/projects/appname public_html/appname
This script will upload all the files found under your /appname folder, including subfolders. That way, if you have a mounted folder on your local machine that has all of your nightly builds, hash files, readmes or anything else, then the same will be found on the server. To reverse the process (mirror a remote folder to a local one), remove the -R toggle and switch the directories around:
open -u username,password -p 21 domain.com ;
mirror public_html/appname /home/techgage/projects/appname
LFTP is a great program for simple file exchanges, and is perfect for uploading to shared hosting or a server that has SSH disabled (which in turn = no rsync).
If you run a website or have files on a server that need to be updated often, then rsync can become your best friend, very quickly. If you rsync your site on a regular basis, it means you always have a recent copy stored on your local machine’s hard drive, NAS or external storage. So, if something breaks, you will be able to quickly restore the file that’s gone awry.
Using rsync manually is simple… just type in the command and the password, then wait. However, if you want to make nightly backups or sync multiple times a day, it can become an arduous chore. That’s why you will be interested in setting up rsync keys and automating the entire process.
To sync everything without incident, it’s recommended you do this as root. However, if all of the files you require are from a user account only (eg. /home/username), you should be 100% fine as long as that same user exists locally (else you will lose the original file owner). In my case, I want to sync more files than what’s found in /home, so I first enter su mode and then the SSH user folder: cd /root/.ssh (or /home/username/.ssh for a regular user). Once in there, you will need to create an identification key; when prompted for a passphrase, hit enter twice without entering any characters.
localhost .ssh # ssh-keygen -t dsa -b 1024 -f rsync-key
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in rsync-key.
Your public key has been saved in rsync-key.pub.
The key fingerprint is:
Once the process is finished, it will create rsync-key and rsync-key.pub. You will need to duplicate this rsync-key.pub to your server’s /root/.ssh folder, to keep things simple. You can do this a variety of ways: Copy over via SSH, Copy/Paste the entire contents of the file, use SCP… it’s up to you.
At this point, you are almost finished. Making sure that the file is on your server, SSH in and enter the same folder as on your local machine: cd /root/.ssh. In order to gain access to the server without a password, you will need to copy your rsync credentials to an authorized_keys file. Regardless of whether it is there already, add the contents of your rsync-key.pub and chmod it before you finish up.
[root@fakeserver .ssh]# cat rsync-key.pub >> authorized_keys
[root@fakeserver .ssh]# chmod 644 authorized_keys
That step was the last one, so you can now log out and back in to your local machine. From here, log back into the server you just logged out of, hopefully password free:
localhost techgage # ssh -i /root/.ssh/rsync-key domain.com
Last login: Mon Aug 20 21:06:09 2007 from xxx000000000000.cpe.cable.domain.com
If that last step proved successful, you are ready to rsync without the hassle of a password, meaning you can set up a new entry in your crontab to automate your backups! To properly rsync with your server, use these steps:
rsync –av -e “ssh -i rsync-key” domain.com:/home/username/projects /home/techgage/projects
Now that you are all set, let’s jump into our script-writing and the best part, crontabs.
Backing up files on a regular basis isn’t too difficult, but if we’re talking about executing a backup procedure every single day, then that changes things. Thanks to the existence of crontab, the Unix scheduler, the pain is lessened to the point of nonexistence. The best part: No install is required for crontab; your distro undoubtedly has it built-in.
To make good use of crontab, though, some BASH scripts need to be written. These scripts can be edited via a GUI text editor or a CLI one, and saved anywhere on the PC. As a personal preference, I store all of my scripts in /home/scripts. You could install yours somewhere else, however, such as /home/user/scripts.
To start, here’s a simple script that archive syncs one folder to another:
rsync –av /home /mnt/ntfs/Backups
Once you save that to a file with an ‘sh’ extension (eg: bashscript.sh), you could execute it with sh bashscript.sh. Simple stuff. However, for more flexibility, you can get a little more advanced:
echo Beginning Backup… ;
mount -t vfat /dev/sdc1 /mnt/thumb ;
rsync –a ––delete /home /mnt/thumb ;
umount /mnt/thumb ;
echo Backup Complete!
That script will acknowledge that the process has started, mount the drive, rsync, unmount, and then tell you it’s finished (this of course assumes the drive either stays plugged in or has the same drive letter each time it is).
What if you want to set up a script to run on a regular basis, or at a specified time when you’re not around? Here’s another.
rsync –av /home /mnt/ntfs >> $backup_log
This one establishes a log location and then proceeds to perform the rsync with all output thrown into the backup.log file for you to take a look at later if need be.
What about nightly backups of your source code? How about creating date-coded tar files that automatically saves to your storage device?
echo “Backup Start: `date +%m-%d-%Y-%T`” >> $backup_log;
tar zvcf /mnt/nas/$backup.tar.gz /home/techgage/projects/application_name >> $backup_log;
echo “Backup End: `date +%m-%d-%Y-%T`” >> $backup_log;
Note: Those are acute accents `, not regular accents ‘
Using this example, a backup file format is established and also our static log file. First, the time that the tar process begins is pasted in, followed by the actual process itself. Using this example, a file output would be: nightly_10-04-2013.tar.gz. You can hit up man date and configure the date format to your liking.
Want to symlink the fresh .tar.gz file to latest.tar.gz for easy handling? This means that you can create scripts to upload the file to a remote server or external device and be able to point it directly to latest.tar.gz, but it would upload the file you want it to. You could add this line to the very end of the above script:
ls -sfn `date +%m-%d-%Y-%T`.tar.gz latest.tar.gz
The possibilities of scripting are endless, and if you have a specific need, there is surely a way to get what you need accomplished with a simple script. Now, how about automation?
So, you want to run your script at a certain time each day/week/month; it’s time to make use of crontab! Once ready to edit, run nano -w /etc/crontab in a terminal, but replace nano with your preferred text editor. If you launch a GUI editor as root (or sudo), you could edit the file that way as well.
Want to run the script at 5:00 AM each morning?
0 5 * * * root sh /home/scripts/nas_backup.sh
To give a quick cron primer, the beginning of each task requires five sets of numbers: Minute/Hour/Day of Month/Month/Day of Week. So if you wanted to run your script at 3:00 AM each Wednesday, it would be:
0 3 * * 3 root sh /home/scripts/nas_backup.sh
Or for 10:30 AM on Monday and Fridays (Sundays = 0, Saturday = 6), and as your own user:
30 10 * * 1,5 username sh /home/scripts/nas_backup.sh
To read more on cron, I highly recommend checking out this Wikipedia page.
After reading through the article, you should have a good idea of what to do now: How to rsync and how to create a script. But what exactly do you want to back up? The /home folder is always a good start; that way, if some data crash occurs, you can easily recover your files. But it can go deeper than that.
All around your hard drive, there could be files worth backing up. Gentoo is my distro of choice, and it has a lot of configuration files laced around that are worth backing up, such as Portage’s package files, the ‘world’ file, the gcc make.conf file, among others. Being able to restore these files from a backup would make getting back up and running a lot easier.
We just touched on the basics here. You could write a robust script to do a variety of things; mine for example, performs the same actions each night in a sequential order. It first cleans up the computer, empties the trash and various caches and then backs up my /home, configuration files, and my kernel source code folder. Then it grabs a few nightly backups off one server with wget, and rsyncs our website server to make sure we have a current backup. All of this is done while saving the output into a static log file.
Yes… there is a lot you can do with a simple script, just as long as you are not worried about spending some time setting things up properly.
So what are you waiting for? Accidents can strike at any time…
This article was originally published on August 27, 2007, and since updated.
Copyright © 2005-2014 Techgage Networks Inc. - All Rights Reserved.