RAID Made Easy
What is RAID, why do you need it, and what are all those mode numbers that are constantly bandied about? RAID stands for "Redundant Array of Independent Disks" or "Redundant Array of Inexpensive Disks," depending on who you talk to. Note that the word array is included in the acronym, so saying "RAID array," as a lot of people do, is redundant.
Back when hard drives were less capacious and more expensive, RAID was created to combine multiple, less-expensive drives into a single, higher-capacity and/or faster volume. On top of that, it was designed to facilitate redundancy, also known as fault tolerance or failover protection, so that the array and its data remain usable when a drive fails. You'll often hear about 1-disk or 2-disk redundancy, which refers to the number of drives that can fail while the array remains viable.
Redundancy is important for a small business, as drive failure does happen. RAID's data redundancy offers no protection against data lost to malware, theft, or natural disaster--and it's certainly no substitute for proper backup practices--but it does provide a fail-safe against hardware failure.
RAID has levels, or methods by which the drives are ganged together; commonly people refer to levels by number. The three most common levels in the consumer and small-office markets are RAID 0, RAID 1, and RAID 5. However, you'll encounter numerous other options too, including levels 6, 10, 5+1, JBOD ("just a bunch of disks"), and Microsoft's virtual disk RAID, as well as abstracted RAID implementations such as Drobo BeyondRAID, Netgear X-RAID, and Synology SHR.
Common RAID Modes
Picture the 0 in the "RAID 0" name as an oval racetrack, and you've divined its primary purpose: faster performance. RAID 0 distributes data across multiple drives (for example, block A goes to and from drive 1, block B goes to and from drive 2), which permits increased write and read speeds. This approach is often referred to as striping, and other modes (as you'll see later) employ the technique as well.
Regrettably (and dangerously, if you aren't aware of the risks) RAID 0 offers no protection against drive failure, since this mode does not write any duplicate or parity information. Hence, when a drive fails, you end up with a puzzle that's missing pieces. In such a situation, your data is quite possibly gone, though you can find service providers that might be able to recover it--for a lot of money.
RAID 1 writes and reads the same data to pairs of drives; it’s also referred to as mirroring. The drives are equal partners--should either fail, you can continue working with the good one until you can replace the bad one. RAID 1 is the simplest, easiest method to create failover disk storage. However, it costs you a whopping 50 percent of your total available drive capacity; for example, two 1TB drives in a mirrored array nets you only 1TB of usable space, not 2TB.
You may have as many pairs of mirrored drives as your RAID controller allows. And in the unlikely event that said consumer-grade controller supports duplex reading, RAID 1 can provide an increase in read speeds by fetching blocks alternately from each drive.
This RAID mode offers both speed and data redundancy. RAID 5 writes data to and reads from multiple disks, and it distributes parity data across all the disks in the array. Parity data is a smaller amount of data derived mathematically from a larger set that can accurately describe that larger amount of data, and thus serves to restore it. Since parity information is distributed across all the drives, any drive can fail without causing the entire array to fail.
RAID 5 uses approximately one-third of the available disk capacity for parity information, and requires a minimum of three disks to implement. Since data is read from multiple disks, performance can improve under RAID 5, though some users report that RAID 5 slows performance greatly when it's processing multiple reads in a server situation.
JBOD, shorthand for "just a bunch of disks," isn't really RAID, but it is often available as an option on multidisk storage boxes that offer RAID. JBOD offers no speed increase or redundancy. Rather, it simply concatenates a group of disks into a single volume. Data is written to the first drive until it's full, then to the second until it is full, and so on, until the last drive has no more room. Even though many network-attached storage devices provide this option, we don't recommend using it unless it's the only thing available, you really need a single large volume, and you don't have the choice of using RAID 0 (an unlikely circumstance).
Microsoft has abandoned this technology, which was formerly employed on NAS boxes running Microsoft Windows Home Server (prior to WHS 2011). A smart file replication methodology, drive extender allowed you to configure which data would be replicated on a folder-by-folder basis.
Next page: Abstracted RAID, Hot Spare, and more
Drobo, Intel, Netgear, and Synology all offer what is frequently referred to as abstracted or virtual RAID. You'll even find a form of abstracted RAID in Windows 7, and Windows 8 Spaces takes the idea even further.
"Abstracted" means that instead of using physical disks as the building blocks of an array, this arrangement employs virtual volumes (or "virtual disks," in Microsoft's parlance). Virtual volumes are handy in that they might take up only part of a disk, or they can expand across multiple disks.
For instance, you could have a virtual volume that consumes all of one 500GB disk and half of a 1TB disk. You would then have a second virtual volume that uses the remaining 500GB on the 1TB drive. The RAID software manages them behind the scenes, and they appear as a single storage unit to the user, if so desired.
Abstracted RAID allows you to mix different-capacity drives and varying levels of fault tolerance, as well as to expand capacity automatically without user intervention. Without it, changing RAID levels involves backing up all the data, reconfiguring, and then copying the data back--a time-consuming and sometimes technically challenging activity.
RAID arrays sometimes employ a hot spare, which is simply an extra disk preinstalled in the NAS box or system that serves to replace a failed disk. This setup allows the rebuilding of the array to proceed automatically without user intervention.
You'll encounter three other RAID options that can be useful. They aren't often found in consumer-level RAID boxes, though they are present in some business-oriented NAS boxes.
RAID 6 is very much like RAID 5: It has distributed parity info, as RAID 5 has, but it also has twice as much of it, which means that RAID 6 can withstand the loss of two drives. With RAID 6, a second set of parity information is distributed across the drives--to the obvious detriment of total capacity. Nevertheless, in situations where you need the highest level of fault tolerance, RAID 6 is a good choice.
RAID 10, also referred to as RAID 1 + 0, stripes data (RAID 0) across mirrored pairs (RAID 1) of drives. With this arrangement, you get back some of the write speed that RAID 1 can cost you--but you need at least four drives to implement it, and 50 percent of the total drive capacity becomes devoted to redundancy.
Conversely, RAID 0 + 1 mirrors (RAID 1) striped pairs (RAID 0) of drives. As with RAID 10, you regain some of the write speed that RAID 1 can cost you. Again, you need at least four drives, and you spend 50 percent of the total drive capacity on redundancy.
Other RAID Options
The RAID specification includes several other levels aside from the ones addressed above; however, these are not commonly used anymore.
RAID 2 distributes data across multiple drives at the bit level (the smallest unit of computer information with a value of either 0 or 1) instead of at the block level. RAID 2 writes Hamming ECC (error-correcting code) recovery information to dedicated parity disks at the byte level, which requires a lot of processing power.
RAID 3 is another mode that got kicked off the consumer island because it doesn't use data blocks; it distributes data across multiple drives as bytes (8 bits), and like RAID 2 it stores parity information on a dedicated drive.
RAID 4 fell into disuse because it distributes data across multiple drives as blocks and stores all parity information on dedicated parity drives; if a dedicated parity drive fails, the entire array remains unprotected until it's replaced and the information is reconstructed. This weakness is also inherent in RAID 2 and 3.
Choosing RAID: A Cheat Sheet
Trying to determine which RAID level is best for you? Here's our take.
- First off, use hardware RAID over software RAID when you have a choice. Software RAID is fast, but many implementations tend to rebuild at the drop of a hat, reducing performance while in progress.
- Use RAID 0 when all you want is faster performance with large files, and you don't need fault tolerance. (But be sure to back up your drives regularly.)
- Use RAID 1 when you have only two drives and you want to protect against drive failure.
- Use RAID 5 when you have more than two drives and you want a hedge against drive failure.
- Use abstracted RAID to derive maximum storage from a collection of drives of different capacities.
Editor's note: This is an update to a story that we originally published April 15, 2010. This version has the most current information on RAID technology.