Storage Wars – Part 1

1

logo RAIDers of the Lost Ark

by Jason Dowd, www.pinncomp.com

Data is the center of the IT universe. Users and devices generate it, applications process it, networks transfer it, and storage – well – stores it.

One trend in storage which seems established beyond question is that our appetite for it is growing exponentially. For example, the Square Kilometer Array, a radio telescope array being deployed in South Africa and Australia and slated to be fully operational in 2016, is expected to generate so much data that the traffic on its private fiber network will be twice that of the present day Internet.

As another example, connectomics researchers estimate that a map of all the connections in just one human brain will require 1 exabyte of storage. That’s one million terabytes. Or, if you prefer, one billion gigabytes.

And that’s for just one brain!

But more down to earth examples are readily available in just about every organization we talk to. From ever growing email databases, to automated meter reading initiatives in power companies, to web intelligence initiatives, to regulatory archival requirements, to IP camera video surveillance storage requirements, we see companies with storage capacities that only a few years ago seemed like more than they would ever need scrambling to add additional capacity.

GetAttachment.aspx

The typical IT storage capacity planning process.

Fortunately, storage technology is keeping pace with these demands, but it is also rapidly evolving. This series of articles will take a look at how it has evolved to this point, where we are now, and give you some

idea of where it looks like we are headed over the next few years along with some specific recommendations for getting your organization there.

This particular article will focus on some history which lays the groundwork for our discussion of present and future technology in our upcoming installments.

The crucial first step in the evolution of modern storage began in the 90’s when servers needed more storage capacity than they could get from a single hard drive, either because drives of that size simply didn’t exist or weren’t economical. The obvious solution was to add more than one hard drive to such a server, but this didn’t always cut it. While the storage capacity might have been enough, often times the server needed to see its storage capacity as a single drive which could store a database, for example, that couldn’t be split up across multiple drives.

Adding additional hard drives to a machine also increased the odds that at least one hard drive in the machine would fail in the same way that flipping three quarters greatly increases the odds of getting at least one tail.

RAID is the technology that solved both of these problems. Standing for “Redundant Array of Inexpensive Disks”, RAID allows users to put a number of relatively inexpensive and not necessary very large drives of the same storage capacity into a single machine, but present a very different picture of the hard drive configuration to the operating system running on the machine.

RAID is configurable in a variety of ways, but the most popular configuration for RAID in those days was RAID Level 5, or just RAID 5 for short. As an example, RAID 5 can configure any numbers of physical hard drives to appear to the operating system as one logical drive, but with a catch. Using a formula that will sound immediately familiar to any fan of “Raiders of the Lost Ark”, the logical drive presented to the operating system will be the size of the sum of all the drive capacities, but then take back the capacity of one drive.

GetAttachment-1.aspx

The RAID 5 capacity formula explained in detail.

That “missing drive” is not being sacrificed to the God whose server this is, but is actively being used by the RAID array to store redundancy information, or “parity” as it is known in the industry. Should one drive of the array fail, it’s no big deal. The operating system and applications running on it will keep right on humming along none the wiser while the failed drive is replaced. The array will then rebuild itself using the new drive.

It is this capability both for expanding storage using relatively inexpensive drives and providing fault tolerance that made RAID 5 so popular. All of the “smarts” required to make it work are taken care of by a device known as a RAID controller. RAID controllers are usually dedicated hardware devices in the machine that simply replaced the more common – and considerably less smart – traditional hard drive controllers.

However, some operating systems actually have the capability to perform RAID in software. This is something that should be avoided like leech infested waters. The only reason I mention it is because we still occasionally see machines configured this way by individuals who think they are being very clever, but are really just creating an accident waiting to happen.

Anyway, RAID 5 is great in that it can handle a single drive failure, but in the event of another drive failure prior to the replacement and rebuild, that is all she wrote. Fortunately, this isn’t very likely. Unfortunately, there were so many servers in the world even then that it was bound to happen occasionally. We’ve been unfortunate witnesses to several such events including one at a local bank that no longer exists. To make matters worse, they quickly found out that, in spite of what their backup software was telling them, they hadn’t actually backed up the server to tape for months.

The result? Data loss, which is pretty much the worst thing that can happen in IT. A University of Texas study found that small businesses that had suffered a major data loss had only a 6% survival rate over the next two years. That’s a rather sobering statistic.

We refer to incidents like this as “Resume Generating Events”, and such events have led to the wide deployment of hot spares in an attempt to further minimize such catastrophic scenarios. Basically another drive in the machine will sit completely idle until a drive in the array fails when it will be activated and the array rebuilt immediately.

As a final layer to the RAID cake, it is worth pointing out that the storage pool created by a RAID controller doesn’t have to be used as a single, large logical drive. Instead it can be carved up into smaller logical drives that will appear to the operating system as separate disks. And in this way, the separation between logical and physical drives is complete.

While RAID arrays really do offer significant advantages, they also present clear and present dangers for the unwary. Pull the wrong drive out of the machine when attempting to replace a failed one? Disaster! Pull more than one drive out of the machine? Disaster! Misconfigure the array controller? Disaster! And so on. Unfortunately, these disasters are much more common in small companies where the server admin is likely someone who has a “real” job and only dabbles with the server because somebody onsite has to.

 

GetAttachment-2.aspx

Did you just pull the wrong drive?

The reason for going over all of this is that RAID still plays a very prominent role in modern storage environments. Different RAID levels, separation of physical and logical drives, and hot spares ready to fire up at a moment’s notice are all part and parcel of any modern storage architecture. Indeed, many organizations still use exactly the type of configuration we have described here. Such deployments are generally used by small, single server companies or by larger companies for small branch office servers. For larger deployment though, Storage Area Networks (SANs) are the order of the day, and it is to this

1 COMMENT

  1. Thank you for this! Bill Henderson, my first computer instructor (at USI long ago!) would love to see this article! The graphics help!!

Comments are closed.