The Ultimate Backup: RAID for Your Home Computer
Text and images copyright Juan A. Pons, all rights reserved

With the number of mega pixels increasing in today’s modern digital cameras, avid amateur as well as professional photographers are quickly running out of room on their computers. Luckily, hard disk sizes have increased rapidly and the cost per megabyte has dropped very fast. However, just adding disks to your system can get very cumbersome and problematic, and unless you perform some kind of backup immediately after downloading images to your computer, you are taking the risk of losing data if a hard drive fails.

Three 200-gigabyte disks in hot-swappable slide-in trays.

One option is to implement a RAID storage subsystem for your computer system. RAID stands for "Redundant Array of Inexpensive Disks". In simple terms, it means data is stored on multiple disks automatically to assure integrity of your data. In some cases, RAID also improves access speeds.

RAID comes in a few different flavors known as ”RAID Levels.” As with everything else, there are advantages and disadvantages to these different levels of RAID.

In this article I hope to give you a little insight on what RAID is all about, what the options are, what compromises you need to make, and how I configured my home system to achieve the best bang for the buck.

SOLUTIONS
RAID systems come in the form of software solutions and hardware solutions. I do not recommend software solutions unless you are really fluent in RAID systems. Software RAID solutions tend to be slow and depend on your computer’s CPU to perform all the necessary data processing. Because of this dependency, in my opinion they are not very reliable. Besides, hardware RAID solutions abound today and their prices are very affordable. With this in mind, the rest of this article will only address hardware RAID solutions.

As mentioned before, there are a few RAID levels, with new ones being developed periodically.

RAID 0 - STRIPING: Striping simply means that data is written across several disks in discrete chunks, instead of all of it being written to one disk. For example, if a file is 100 kilobytes in size and your stripe chunk is set to 20 kilobytes, the first 20 kilobytes of the file is written to the first disk, the second to the second disk, etc. It then circles back around to the first disk and continues in this round-robin fashion. This RAID level does not offer any reliability benefits so I am not going to say much about it except that this is used for high performance applications. Since the data is striped across multiple disks, READ and WRITE operations can be done in parallel, thereby offering significant speed increases.

RAID 1 - MIRRORING: This RAID level is simple mirroring, meaning you have two disks that appear like one to your system. This RAID subsystem takes care of mirroring the data on the two disks. If one disk fails, the RAID subsystem will then use only the good disk. Most RAID 1 systems will perform the same as non-RAID solutions in WRITE operations. However, some RAID 1 solutions will increase your READ operations as the RAID subsystem will spread the READ operation between both disks. In essence, it will read the data in parallel. RAID 1 solutions need to be implemented with an even number of disks. The biggest drawback to RAID 1 is that you lose 50% of your total disk capacity. For instance, if you have two 100-gigabyte drives, these total 200 gigabytes. But once you put them in a RAID 1 configuration, since you are mirroring, you will have use of only 100 gibabytes.

RAID 5 - STRIPING & PARITY: RAID 5 is by far the most popular and versatile RAID level. It provides a very reliable storage platform for comparatively low cost. RAID 5 is slower in WRITES, but READS are very fast as the READ operations occur in parallel. For RAID 5, you need at least 3 disks, and the capacity of a RAID 5 system is equal to the total capacity of the member disks minus the capacity of one of the member disks. So with 3 100-gigabyte disks you would end up with 200 gigabytes of usable storage space. If you had 7 100-gigabyte disks you would have 600 gigabytes of usable disk space. You can see now why this is so popular, as this is the most cost effective RAID solution. This system will continue to operate even during a single disk failure.

RAID 10 - MIRRORING & STRIPING: This RAID level is the best solution when you need both high reliability and high performance. However, it is one of the most expensive solutions. What happens on this RAID level is that the data is mirrored, and then striped across several disks. Both READS and WRITES are very fast because they are happening in parallel across several disks. This gives you the best of both worlds, but like I said, it is expensive. The expense comes from the fact that like RAID 1 you lose 50% of your total disk space for redundancy. You need at least 4 disks, and you have to add disks in pairs (for an even number of disks). However if you have the need for this level of performance and reliability, and you can afford it, nothing beats RAID 10.

Note that Raid 10 is also commonly referred to as RAID 0+1, and most people say it's the same, however it is not, just similar. RAID 0+1 is not as reliable as RAID 10 for reasons that are outside the scope of this article. Only implement RAID 0+1 if you have very specific needs that only a RAID 0+1 solution will meet, and, to do so, you need to know exactly what you are doing!

All RAID controllers that I know of will notify you when a disk fails. All you should need to do is simply replace the failed disk; the RAID controller will do the rest and rebuild your RAID array. Keep in mind that during this rebuilding period you can still access your data but your READ/WRITE performance will be slower than normal.

SETTING UP AT HOME
We have now seen the properties of the different RAID levels. At this point you may be asking yourself, "How can I implement a RAID subsystem at home?" Well, first you have to decide what you need and what you can afford. I would guess that for most of us doing digital photography, RAID 5 will be more than sufficient and will give us "the most bang for the buck." I personally use RAID 5 on my home system and performance is significantly better than what I was getting with my old disks.

There are numerous vendors who offer RAID controller cards for PC's and Mac’s. However, since I have not worked with Mac's in over a decade, I am going to concentrate on the PC world. Most if not all of the PC solutions work on Linux as well.

3ware 7500-4LP ATA RAID controller. Notice that there is one ATA cable for each of the drives.

There are a number of RAID hardware vendors out there. I like 3ware (http://www.3ware.com/) the best since they have what I believe is the most complete line of PC RAID solutions for both ATA (EIDE) and Serial ATA (SATA) systems. SATA is a new way to connect drives to your computer that is much faster than ATA, which is what most home computers currently use. Additionally, SATA is not as costly as SCSI, which is what most expensive computer servers use. Other vendors include Promise Technology (http://www.promise.com/), AMI (http://www.ami.com/), Adaptec (http://www.adaptec.com/) and others. You can read a review of some of the cards at this link: http://www.anandtech.com/storage/showdoc.html?i=1491.

RAID has been around for a long time, but it was not until recently with the introduction of IDE RAID that it became affordable to us regular users. I have implemented very large RAID solutions costing more than $500,000, using both SCSI and Fiber Optic technologies, and I always wanted to have a RAID system at home. With the advent of fast, inexpensive IDE drives and IDE based RAID hardware, the "I" (for “Inexpensive”) part of RAID has really come true.

When purchasing a RAID card you can select one that only performs a single level of RAID. RAID 5 cards are very popular and so are RAID 1 cards. But you can also buy some cards that do multiple RAID levels. I prefer the latter; they are a little more expensive, but I think it's worth the investment, as I may change the RAID level as my storage needs change.

Once you have a card, you will need disks. When you implement RAID, it is advisable that you use the same exact model of hard drive for all your RAID disks. This is not strictly necessary, but highly advisable, as your RAID will perform only as well as your slowest, least reliable drive, such as the saying “the weakest link in the chain.” I would suggest you get disks that are no slower than 7200 RPM, and make sure they have the biggest cache you can afford. I personally recommend an 8-megabyte cache. These drives with the right controller cards are so fast these days that they rival SCSI based disks. My current RAID 5 setup at home performs better than the single disk systems I was using before. I attribute this mostly to the disks I chose.

Once you have disks, I suggest you buy a RAID enclosure to put them in your computer. Again there are many vendors and models. Look for what is referred to as "mobile rack enclosures". If you are going to run a RAID 5 system I would suggest you look at the enclosures that hold three 3.5" disks but only use up two of the 5.25" bays on the typical PC tower system. Most of these will allow for a hot-swap capability for replacing a failed drive without having to shut down the computer. As with the RAID controller I chose to use a 3ware product (http://www.3ware.com/products/ata.asp). 3ware is not the cheapest system around but I think they are the leaders in this field today. Using a disk enclosure is not necessary, but I highly recommend it. Alternatively, if you have the space inside your computer you can just place the disks in there.

Once you get your RAID controller (PCI based card), hard drives and disk enclosure, you can go ahead and install it all. It is not terribly hard to do, but you have to feel comfortable opening up your computer and moving things around. If you do not feel comfortable doing this, chances are that you have a friend who does.

Once you have all your hardware installed, read the manuals that came with your RAID controller on how to configure your RAID subsystem. In most cases this is very straightforward. Once the RAID subsystem has been configured, it will appear to your machine as one large disk. At this point you will need to format the disk, and then you have a very reliable place to store all your images.

Even though you could boot off your RAID subsystem, I would suggest not doing so. In my opinion it is preferable to keep your operating system and data separate, if at all possible.

You may be wondering what all this costs. As an example, below is the cost of my home set-up. I set up my RAID in a RAID 5 configuration and this provides me with 400 gigabytes of usable disk space.

RAID controller: $290
Disk enclosure: $250
200 gigabyte hard disks at $140 each: $420

Total cost: $960
Total available disk space: 400 gigabyte
Cost per gigabyte: $2.40

Please keep in mind that you can go with much cheaper, or much more expensive, solutions. If you needed to go cheaper, you could get a RAID 1 card which has a current cost of approximately $60, get another drive just like the one you have now or get two bigger drives. Place them internally in your computer, if you have room, and you are ready to go! For more expensive solutions, the sky is the limit.

I hope that you found this article to be informative, and that it inspires you to implement such a solution in your digital darkroom. Please feel free to email me if you have any comments and/or questions at jpons@wildnaturephoto.com.

Juan Pons is a computer systems engineer by trade and a digital photography enthusiast at every opportunity. Juan travels frequently to photograph wildlife and is an avid supporter and photographer of nature in his home state of North Carolina. To view Juan’s photography, visit his website at www.wildnaturephoto.com.

 

Feel free to send your comments on this article to the editors at NatureScapes.Net.

All content on this site is copyrighted material as indicated. Unauthorized use or reproduction is prohibited.