High time for high availability

In many regards, the concept of high availability exceeds in importance other aspects of storage, including high speed and performance. Over decades of experience, we as an industry and field of technology have come to understand that storage isn’t an item where the lights go off when convenient. Or at all. By Thomas Kao, Senior Director of Product Planning at Infortrend.

12 years ago Posted in

CUSTOMERS DEPEND ON STORAGE as much as they rely on their power grid, with everyone from SOHO users to the largest organisations obviously needing access to data non-stop. The net result is simple: you can pack a storage system with all the performance features and user-friendly enhancements possible, but without carefully planned and implemented redundancy it all becomes a moot point if this powerful and friendly system is rendered completely unavailable to your customers and their teams.

Requiring maintenance, installing a new component, or any other such routine procedure cannot be allowed to shut down an entire storage system. Interruptions such as those directly translate into lost productivity, lost revenue, and frustration. Even worse, while revenue may be recouped, a service-providing enterprise or organisation may find that what appeared to be an isolated storage outage has lasting ripple effects. Your customers have long memories, and that time your services were not available to them because you were swapping power supplies may end up affecting your reputation, turning an incident into permanent harm.

The path to avoiding such outcomes is clear: high availability and redundancy must be integral aspects of as many storage solutions as possible. Having close relations and productive cooperation with our customers, we know that there are plenty of examples in the real world of outages causing lasting repercussions. The demand for high availability is a tangible one, and a given in a global business environment where there’s no true downtime. You may be off work, but your customers and colleagues on the other side of the planet are just starting their day. And we all share the same mission-essential data. Without it, we simply cannot do our jobs, and that is not an acceptable eventuality.

I would like to make it clear that high availability is by no means a brand new concept, but providing for it on as many solutions and in a multi-tiered design is a growing trend. Perhaps in the past solid high availability tended to be the realm of costly storage systems designed for large scale SAN and DAS applications. But now, we believe even NAS systems for SMBs must be as available as possible. Why should an individual entrepreneur or an SMB compromise their ability to deliver services around the clock? They should not. As storage system vendors, it is our responsibility to develop solutions that bring high availability to all customers.

So, if high availability is indeed a must have, how do we go about implementing it? Without delving into too many technicalities, let me just say high availability has to function on the hardware and software levels, as well as via virtualisation. Redundancy means more than having two of a component. It means having multiple layers of protection against failure and stoppage, so even if your hardware has problems, you can quickly resort to software-based backup machine or remote solution and continue working. This overlaps to a large degree with the concept of disaster recovery.
We can start with the hardware. On top of selecting the best components for every product price segment, it is critical to make parts redundant and easy to work with. Dual power supplies are a given in storage and server products as of this writing, but they are not always a snap to install and remove. So yes, make sure customers have two power supplies, that the system can continue working at 100% even if one of those two fails, and that PSUs are simple to handle. Slot- in, cable- free designs are the best, as they are quick and do not require lengthy maintenance sessions.

Power supplies are in fact a perfect example of high availability. We can have a great 80 PLUS unit, but if it fails and its peer does not instantly kick in, or if replacing it is a time-consuming chore, then we really have not helped our customer as much as we could have, despite the energy efficiency. Any savings achieved on power bills will be rapidly offset by a downed system, so redundancy must go hand in hand with efficiency.

Controllers are a similar aspect of high availability. To keep product cost down and offer options for entry-level buyers, it is understandable that we have single controller models on sale. Yet we can make these single controller variants easily upgradeable to dual controller solutions. Even better is to develop attractively-priced dual controller systems that more customers will be willing to invest in. If properly implemented and explained, high availability is an obvious advantage that most buyers would want, and will make budget allowances to accommodate. As an industry, we do not need to be coy, as like in most things, you get what you pay for, and high availability is something you want to get. However, we should endeavour to make it as accessible as possible to all customers. As I said earlier, ideally high availability is an asset that most, if not all, storage systems should possess.
Back to those dual controllers. Let’s make them an active-active configuration, rather than an active-standby one. In this way, if one stops working for whatever reason, auto failover kicks in quickly, with minimal delay. With failover, it is clearly the faster, the better. Every second counts when it is access to service and the well-being of your enterprise at stake.

Hot swappable power supplies and controllers are just the beginning. A powerful and advanced software and data services backbone is the next tier, or layer, of high availability, that we must incorporate. File systems such as ZFS lend themselves very well to high availability features. With this architecture, independent checksum methodology helps eliminate data corruption in the first place, though it may not be enough in the event of hardware failure. To bolster our defences against such an occurrence, ZFS helps us to provide customers easier access to unlimited snapshots, at intervals just a few minutes apart, further reducing the likelihood of catastrophic data loss.

We can also incorporate real time IPsec-encrypted pool mirror to automatically shift operations to a secondary system, which has the same data as our primary. Likewise, remote replication is essential for backup purposes, and should be 128-bit encrypted to inject a layer of security to advanced redundancy. RAID hot spare disk roaming and rebuilding are another aspect of this, allowing us to move entire volumes to another platform if one piece of hardware fails.

These solutions do require redundant storage hardware at the ready. They mean customers should purchase more drives, so that their systems will be able to absorb outages and mitigate them by moving data to non-affected physical components.

This is great, but customers may not always want to invest in an abundance of equipment. This is where virtualisation comes into play. High availability means making your solutions compatible with and optimised for VMware, Citrix, Windows Server and other platforms to ensure easy data migration and remote recovery via software and the cloud. As mentioned earlier, true high availability is a fusion of hardware, software, and network features. Focusing solely on one or the other means a compromised solution.

There is much more to say about high availability. My goal in writing this is to serve up a reminder, not a technical essay. I could go on and into much more detail, but that would mean taking up too much of your time. And after all, you need to be available!