Techgage logo

Backing Up Your Linux

Date: August 27, 2007 - Author: Rob Williams

Backing up your computer is important. Don't be the sucker who loses important files and has to deal with it afterwards! In this how-to, you will learn about using rsync and lftp, writing your own scripts and adding them to crontab and of course, backing up to your external storage, NAS and also a remote server running Linux.



Introduction, rsync Basics

Although I don't like to admit it, I have screwed up many times since I first began using computers. My first big, "Oh shi!" was when I was about 8 years old, goofing around on our trusty 286 with monochrome monitor. DOS was a new world to me, and I quickly found out that format c: was NOT the proper command for deleting contents off a floppy disk. What a fun day that was...

Luckily, much has changed since then, but even in recent memory I can recall careless mistakes I've made, which have led to lost files. But, no longer. I made it a goal to keep perfect backups of my data so that I don't suffer such a fate again, and I highly recommend everyone reading this article to do the same. It's one of the worst feelings when you realize you lost a file that you cannot get back... so don't let something like that happen to you.

Today's article will be focusing on just that... backing up your files in Linux. This article will not be focusing on backing up your entire system, although I'm sure that you could do such a thing with some of the tips provided here. That's up to you to test out if you are interested. Instead, we will be focusing on backing up your personal data, anything you feel you want to keep safe.

There are many different mediums that you can back up to, but we are going to take a look at the three most popular and go through the entire setup process of each: 1) Backing up to external storage; 2) Backing up to a Network-Attached-Storage (NAS) and 3) Backing up to a remote server running Linux. On this first page, we are going to delve into the wonderful tool that is rsync, and give examples for you to edit and test out for yourself.

Why rsync?

"Why not rsync?" might be the better question. Essentially, rsync is a file synchronization application that will match the target with a source, with proper permissions retained. The beauty is that its usage is not simply limited to the PC you are on, but it can access remote servers via RSH or SSH.

As an example, let's use the scenario that you want to back up your top secret documents folder to a thumb drive. When given the "OK", rsync will copy all of the files to your thumb drive, making sure to retain the correct permissions while verifying that all of the files match. If you update a file and run rsync again, it will update the file to the storage you specify... simple as that. Though it's not immediately user-friendly, given its command-line-based nature, once you gain basic knowledge of how it works, you will wonder how you went so long without using it.

The important thing to note is that there are numerous methods of backing up your files and system under Linux and that I am only touching the basics. This tutorial is based off of what I personally rely on, on a daily basis, and are also techniques that can be easily implemented into your own system.

How To Use rsync

As mentioned above, rsync is a very powerful application and we can't possibly exhaust all of the possibilities with this article. We will touch the basics however, with hope that you will finish reading and find yourself more knowledgeable overall and raring to go. Once you set up a cool backup scheme, it will mimic commercial software for Windows. While rsync is not too pretty too look at, it's just as effective, if not more so, and free!

First, the absolute basic. Copying one folder to another:

rsync -r /home/techgage/arch /home/techgage/bintoo

That will copy the folder with simple application, without that much care to preservation of permissions/owners/etc. To make exact duplicates, we can add the -a switch.

rsync -a /home/techgage/centos /home/techgage/damnsmall

To retain ALL owners correctly though, it's best to do all of this in superuser mode. You are obviously not going to properly copy root files without being root. To increase verbosity, add a -v. That will increase the output and give interesting information, such as overall speed and total size.

rsync -av /home/techgage/elive /home/techgage/fedora

Many people who back up a folder or sets of folders want both the target and source to be identical. However, rsync's default method is simply to overwrite what's there with newer files, ignoring the fact if there are files on the target device that are no longer on the source. eg: You back up your home folder, but deleted a bunch of files off your desktop. You obviously want them gone off the backup as well, and luckily, that is easy to take care of.

rsync -av --delete /home/techgage/gentoo /home/techgage/helix

If you are new to rsync, you might be thinking, "Is that all there is to it??" The answer is yes, a very big yes. ;-) Though all of these examples look, and are, simple, there are many things that you could do without even leaving your seat. Imagine backing up one Windows machine to another (aka your moms and your sisters).

rsync -av --delete /mnt/virus1 /mnt/virus2

The above example would assume that each computer is properly mounted, via Samba, a technique we tackle on page three of this article. What about backing up your server or another Linux machine in your house? Time to bring SSH into the picture! This assumes that each computer has SSH installed and that the source machine has an SSH server running. If it's not, it should be as easy as running /etc/init.d/sshd start.

rsync -av --delete -e ssh root@192.168.1.5:/home/techgage/knoppix /home/techgage/linux-gamers
(rsync -av --delete -e ssh root@targetcomputersipaddress:/remotefolder /localfolder)

SSH in itself offers a lot of potential, as well. If you have multiple servers and need to duplicate a folder from one to the other... just SSH in and run rsync. The possibilities are near-endless.

This is about as much of rsync as we will tackle, but don't hesitate to check out man rsync for a full options list and other examples. That's not as far as our samples go though, as we have many scripts on the last page of this article. For now, let's move right into backing up to an external storage device.

Backup To External Storage

On this page, we will be tackling the absolute basics of mounting your external storage device. This could be a flash-based thumb drive or external hard-drive, such as one in an enclosure. If you know the basics of mounting your hardware, then you can bypass this section entirely.

Depending on your external storage's filesystem, you will need to make sure that the support is built into the kernel. Chances are very good that there is but if not, you will need to go into your kernel configuration and add it.

You likely already have Ext2 and Ext3, but if not you can enable them to be built into the kernel. If you lack VFAT support, which would be unbelievable, you can add it under File systems > DOS/FAT/NT Filesystems > VFAT (Windows-95) fs support. Once done, exit and run make ; make modules_install. At that point, you should reboot and let the new modules take effect, or copy the kernel and boot into it, if you compiled a different kernel version from the one you are currently using. If you compiled the FS as modules, simply modprobe them.

While most distros auto-mount thumb drives and the like, I dislike that method when working with backups. If you have an external hard drive that is -always- plugged in, it's fine, but if you remove it often, drive letters can change and render your script useless. This is all moot if you plan to manually type the rsync command each time you want to back up.

If your distro picks up on the drive and you'd rather it not, then simply unmount it and do your own thing. -If- the distro mounts the drive to the exact same point each time (eg: /media/disk1) then you shouldn't have to worry about much. It's only when device points change that it will become a problem (eg: /media/sda1 /media/sdb1 /media/sdc1 etc).

Figuring out drive letters can be fun if you've never done it before. Technically, each IDE device counts as one, and same with each S-ATA device. That said, if you have two IDE CD-Roms and two S-ATA hard-drives, the devices would be:

If you plug in an external drive with two partitions, it would be:

Any new external device will be /dev/sd* more than likely, so if you had four devices as shown in the top example, a thumb drive would assume the place of /dev/sdc, as the direct example above shows. Let's assume that your external hard-drive has a FAT32 and NTFS partition.

techgage@localhost ~ $ su
Password:
localhost techgage # mkdir /mnt/thumb
localhost techgage # mount -t vfat /dev/sdc1 /mnt/thumb

Most external hard-drives will be FAT, although high-density drives might use NTFS if it's store-bought. External hard-drives can be ext3 as well, but that's up to you. If you are unsure what type of partition a drive is, you can simply fdisk /dev/sdc and hit p and note the partition id code. Hit l (small L) to view the known partition types and match the code. In FAT32's case, the code is b, while NTFS is 7. Process to mount is all similar though, more often than not.

techgage@localhost ~ $ su
Password:
localhost techgage # mkdir /mnt/ntfs
localhost techgage # mount -t ntfs /dev/sdc2 /mnt/ntfs

You can add lines to your fstab if the drive is permanent.

/dev/sdc1 /mnt/thumb vfat defaults 0 0
/dev/sdc2 /mnt/ntfs ntfs defaults 0 0

With your drive mounted, you are now able to back up like a pro. Next up, setting up your NAS.

Backup To Network-Attached-Storage (NAS)

Network-Attached Storage (NAS) is a rather new technology, but is becoming more popular by the day. Simply put, a NAS box is an appliance that connects to your router like a normal computer. Inside is an installed hard-drive, or two, depending on your model. Once installed, you will be able to access the box on your network just as if it were another PC. The great thing about NAS, and the reason it's included in this article, is that it's a handy way to securely backup your important files. If your computer blows up, at least the NAS is there to save you.

Because a NAS is a network device, we are going to use Samba to connect to it. Some NAS products pre-install SSH and FTP support, and if that's the case, you could simply use SSH to connect instead of Samba. But for the sake of assuming that most NAS boxes do not support SSH (only the expensive ones do), we will stick to what most already have installed on their machines.

In order to mount our network drive, we are going to need the Samba Filesystem support. To see if you have support on your machine already, look under your /sbin to see: ls -l /sbin/mount.smbfs. If you do, skip this step. If not, you will need to go into your kernel and install it. If you are unsure how to compile a kernel, or grab kernel sources, refer to your distributions wiki or how-to pages.

As superuser, head to your kernel source directory and enter the configuration: cd /usr/src/linux ; make menuconfig

Once in, go to File systems > Network File Systems and enable "SMB file system support (to mount Windows shares etc.)" as a module. Exit the configuration tool, making sure to save the new config and run make: make ; make modules_install. Once finished, load the module into your kernel: modprobe smbfs.

At this point, you might want to hit up your routers admin panel to find the IP address for your NAS, unless you already know it. Inside the router, you should see a page called "DHCP Leases" which will list it:

With the IP address in hand, hit it up with your web browser and configure the NAS to your liking. You might want to name it something simple so that it will be easier to deal with. If you have a dual-hdd NAS and want the ultimate in security, you will want to choose RAID 1, which mirrors each one of the drives. JBOD is fine if you want a lot of storage, but it doesn't make for a very secure "backup", if one of the drives in the NAS fails.

If you haven't run into problems yet, you should be good to run smbtree and find the network share you want to use. In my case, \\TG_NAS\ is the NAS, while Volume_1\ is the primary folder.

If for some reason you can't use the smbtree, your desktop environment should have some easy way of accessing the list. In KDE for example, "Remote Places" can be found under the K Menu, beneath the System Menu sub-menu.

Windows shares use backslashes, but we will need to change those to normal slashes, and also the characters to small letters. So in the case of \\TG_NAS\, it becomes //tg_nas/. Now the only thing to do is create a mount folder and mount the share:

For the sake of simplicity, I created a folder under /mnt called nas, but you can name it to whatever you like. Mounting is simple: mount -t smbfs followed by an option switch and also the password if required: -o password=techgagerox. If your NAS doesn't have a password, omit that part.

We already established that our network share was //tg_nas/ and the folder to access as Volume_1, so once done putting that all together, our share will be successfully mounted to /mnt/nas, as you can see in the photo. To see how much free space is available, you can do a df -BMB, also shown in the image.

To have the network drive mount with each boot, you can add an entry to your /etc/fstab. Here is a sample:

//tg_nas/Volume_1 /mnt/nas smbfs username=admin,password=techgagerox 0 0

From this point, your NAS is good to go. To back up, you can use any of the examples found thus far, or move onto the next page where we have some more elaborate examples.

Backup To Remote Server, rsync Without Passwords

Run a Linux-based web server? If so, your backup possibilities have increased dramatically. Depending on whether you are copying to, or grabbing from the web server, there are two applications I love to use: rsync, as you have likely already guessed, and also LFTP, a command-line-based FTP application that is feature-rich and a pleasure to use.

Let's delve into our LFTP usage first, since our rsync techniques will take a little more time. First, if you are unsure whether or not this application is installed, simply type lftp at the command-line. If you are using a distro with a repository, chances are that it will be found there. If not, feel free to compile by source.

Suppose you are the head of a popular open-source project and would like to upload nightly builds of your source code. The easiest way would be first creating an LFTP script, followed by a shell script and adding it to your crontab. Let's tackle the first challenge first.

If you have a directory full of nightly builds (which could be built with our scripts on the next page), you can use LFTP to automate the process of uploading them. First you would want to create a simple script that LFTP understands, which are thankfully, similar to shell scripts. I -highly- recommend saving this under a root account because your password will be clear as the day (unless you happen to live in a fog-belt, whereas that would be a bad example).

lftp open -u username,password -p 21 domain.com ;
lftp mput -O public_html/nightly /home/techgage/projects/appname/nightly/*.tar.gz

Running this script with lftp -f lftp_script would proceed to upload all *.tar.gz files found in that directory. If the tarball exists on the server already, it will not be overwritten. So that way, it will only upload whatever is new, which is likely what you want. On the server-side, you could create a script which would symlink the new file to say, latest.tar.gz, since LFTP and FTP in general doesn't have that kind of functionality.

You want more than just that one filetype to be uploaded? How about the entire folder? Let's move onto the mirror command with our sample script:

open -u username,password -p 21 domain.com ;
mirror -R /home/techgage/projects/appname public_html/appname

This script will upload all the files found under your /appname folder, including subfolders. That way, if you have a mounted folder on your local machine that has all of your nightly builds, hash files, readmes or anything else, then the same will be found on the server. To reverse the process (mirror a remote folder to a local one), remove the -R toggle and switch the directories around:

open -u username,password -p 21 domain.com ;
mirror public_html/appname /home/techgage/projects/appname

LFTP is a superb program for simple file exchanging, but for more in-depth and important tasks, you should trust rsync.

Using Rsync Without Passwords

If you run a website or have files on a server that need to be updated often, then rsync can become your best friend, very quickly. If you rsync your site on a regular basis, it means you -always- have a recent copy stored on your local machines hard-drive, NAS or external storage. So, if something breaks, you will be quickly able to restore the file thats gone awry.

Using rsync manually is simple... just type in the command and then the password, then wait. However, if you want to make nightly backups or sync twice a day, it can become an arduous chore. That's why you will be interested in setting up rsync keys and automating the entire process.

To sync everything without incident, it's recommended you do this as root. However, if you are positive that all of the files you require are from a user account only (eg. /home/username), you should be fine. In my case, I want to sync more files than what's found in /home, so I first enter su mode and then the SSH directory: cd /root/.ssh. Once in there, you will need to create an identification key. When prompted for a passphrase, hit enter twice without entering any characters.

localhost .ssh # ssh-keygen -t dsa -b 1024 -f rsync-key
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in rsync-key.
Your public key has been saved in rsync-key.pub.
The key fingerprint is:
0e:2d:a9:fc:34:66:86:3c:42:43:b4:d2:8e:3c:50:52 root@localhost

Once the process is finished, it will create rsync-key and rsync-key.pub. You will need to duplicate this rsync-key.pub to your servers /root/.ssh folder, to keep things simple. You can do this a variety of ways: Copy over via SSH, Copy/Paste the entire contents of the file, use SCP... it's up to you. Me, I just used gFTP to log onto the server as root and copied it that way.

At this point, you are almost finished. Making sure that the file is on your server, SSH in and enter the same folder as on your local machine: cd /root/.ssh. In order to gain access to the server without a password, you will need to copy your rsync credentials to an authorized_keys file. Regardless of whether it is there already, add the contents of your rsync-key.pub and chmod it before you finish up.

[root@fakeserver .ssh]# cat rsync-key.pub >> authorized_keys
[root@fakeserver .ssh]# chmod 644 authorized_keys

That step was the last one, so you can now log out and back in to your local machine. From here, log back into the server you just logged out of, hopefully password free:

localhost techgage # ssh -i /root/.ssh/rsync-key domain.com
Last login: Mon Aug 20 21:06:09 2007 from xxx000000000000.cpe.cable.domain.com
[root@fakeserver ~]#

If that last step proved successful, you are ready to rsync without the hassle of a password, meaning you can set up a new entry in your crontab to automate your backups! To properly rsync with your server, use these steps:

rsync -av -e "ssh -i rsync-key" domain.com:/home/username/projects /home/techgage/projects

Now that you are all set, let's jump right into our script-writing and the best part, crontabs.

Scripts, Crontab, Final Thoughts

Backing up files on a regular basis isn't difficult, but it can be if you manually type the command each time. In the world of crontab, there is absolutely no sense of wasting such time. Welcome to your rsync script! Anywhere on your machine, create a .sh file. For the sake of simplicity, we are calling ours backup.sh. You can create it with vi, nano or whatever your preferred text client is.

#!/bin/bash
rsync -av /home /mnt/ntfs/Backups

Once you save that file, you could execute it with sh backup.sh. Simple stuff. However, for more flexibility, you can get a little more advanced with your script.

#!/bin/bash
echo Beginning Backup... ;
mount -t vfat /dev/sdc1 /mnt/thumb ;
rsync -a --delete /home /mnt/thumb ;
umount /mnt/thumb ;
echo Backup Complete!

That script will acknowledge that the script has begun, mount the drive, rsync, unmount and then tell you it's finished. But what about if you want to set up a script to run on a regular basis, or at a specified time when you are not around? Here's another.

#!/bin/bash
backup_log=/var/log/backup.log
rsync -av /home /mnt/ntfs >> $backup_log

This one establishes a log location and then proceeds to perform the rsync with all output thrown into the backup.log file, for you to take a look at later if need be. What about nightly backups of your source code? How about creating date-coded tar files that automatically saves to your storage device?

#!/bin/bash
backup=nightly_`date +%m-%d-%Y`
backup_log=/var/log/nightly_backup.log

echo "Backup Start: `date +%m-%d-%Y-%T`" >> $backup_log;
tar zvcf /mnt/nas/$backup.tar.gz /home/techgage/projects/application_name >> $backup_log;
echo "Backup End: `date +%m-%d-%Y-%T`" >> $backup_log;

Using this example, a backup file format is established and also our static log file. First, the time that the tar process begins is pasted in, followed by the actual process itself. Using this example, a file output would be: nightly_08-20-2007.tar.gz. You can hit up man date and configure the date to your liking though. Note that those are acute accents, and not regular accents.

Want to symlink the fresh .tar.gz file to latest.tar.gz for easy handling? This simply means that you can create scripts to upload the file to a remote server or external device and be able to point it directly to latest.tar.gz, but it would upload the file you want it to. You could add this line to the very end of the above script:

ls -fns `date +%m-%d-%Y-%T`.tar.gz latest.tar.gz

The possibilities of scripting are endless, and if you have a specific need, then there is surely a way to get what you need accomplished done with such a script. Once your script is prepared, how about automation?

Setting Up Crontab

So you want to run the script at a certain time each day/week/month? It's time to hit up crontab. Chances are you have a cron daemon already installed, so to test, type in crontab and see if a small help displays. Once ready to edit, run nano -w /etc/crontab, but replace nano with your preferred text editor.

In my case, I run the backup each morning at 5:00AM, since I never turn the PC off and I happen to be on the PC sometimes right before that period hits, but never after.

0 5 * * * root sh /home/nas_backup.sh

To give a quick Cron primer, the beginning of each task requires five sets of numbers: Minute/Hour/Day of Month/Month/Day of Week. So if you wanted to run your script at 3:00AM on each Wednesday, it would be:

0 3 * * 3 root sh /home/nas_backup.sh

Or for 10:30AM on Monday and Fridays (Sundays = 0, Saturday = 6):

30 10 * * 1,5 root sh /home/nas_backup.sh

Cron will most likely be the most difficult part of your task. To read more, I highly recommend checking out this Wikipedia page.

Final Thoughts

After reading through the article, you should have a good idea of what to do now: how to rsync and how to create a script. But what exactly do you want to back up? The /home folder is always a good idea. That way if something bad happens, you can easily recover your files. But it can go deeper than that, as well.

All around your hard drive, there are files you could potentially save, to make a potential recovery even easier. Gentoo is my distro of choice, and it saves various configuration files that would help with a re-installation or recovery. Such files include the Portage package and world files, which list all of the installed software on the PC, as well as the make.conf which includes USE flags and gcc configuration. Your distro will have similar distro-specific configuration files, so be sure to figure out where they are, and back them up.

We just touched the basics here. You could write a robust script to do a variety of things. Mine for example, performs the same actions each night in a sequential order. It first cleans up the computer, emptying the trash, various caches and then backs up my /home, configuration files, and my kernel source code folder. Then it grabs a few nightly backups off one server with wget, then rsyncs our website server to make sure we have a current update. All this is done while saving the output into a static log file. Yes.. there is a lot you can do with a simple script, just as long as you are not worried about spending some time setting things up properly.

So what are you waiting for? Accidents can strike at any time...

Discuss in our forums!

If you have a comment you wish to make on this review, feel free to head on into our forums! There is no need to register in order to reply to such threads.


Copyright © 2005-2009 Techgage Networks Inc. - All Rights Reserved.