Modular converged IT infrastructure provides those who adopt it with proven pre-integrated building blocks for use in their data centres, which makes expanding or reconfiguring those data centres a fast, straightforward and relatively inexpensive task. Converged hardware infrastructure is the perfect complement to virtualisation, with the combination of the two providing exceptional levels of flexibility, economy and, if properly structured and managed, resilience.
There are, however, two common misconceptions about the best way to power converged hardware. The first is that, irrespective of the application, there is no reason to move away from traditional power system architecture. The second, which is rather more radical, is that power protection and power management are not particularly important with converged infrastructure. Let’s examine how these misconceptions have arisen, and why they are incorrect.
To do this, we must consider something that is a fundamental requirement for every data centre – resilience. Traditionally, resilience has been seen as an issue that relates primarily to the hardware and operating system, and much time and money has been spent on making these as reliable as is realistically possible. The power train is configured to support maximum hardware availability, and is usually built to a very high specification with redundant UPS systems, redundant generator sets, a dual power bus and a generously sized Static Transfer Switch (STS).
Let’s be clear that this traditional approach is perfectly valid and, for some users, especially those who must have guaranteed 24/7/365 availability and who cannot tolerate even short-term performance degradation, it remains the most satisfactory solution. With converged and virtualised infrastructure, however, it is possible to implement resilience in layers other than the physical hardware layer. In many cases, this is an attractive option.
Resilience can, for example, be implemented in the user layer, by considering whether the IT resource really does have to be available continuously 24/7/365, or whether momentary outages could be tolerated. Resilience can also be built into the application layer by developing fault-tolerant applications, as Google has done with considerable success.
The most attractive option in most cases, however, is to build resilience into the cloud/virtualisation layer by adopting a cluster approach that is prepared to deal with failures on the lower layers. It can do this, for example, by moving virtual machines to hardware that is unaffected by the failure, or even by using public cloud services as a backup site.
Providing resilience at layers other than the hardware layer has many attractions – typically costs are reduced while flexibility is enhanced – but, unfortunately, it has also allowed a dangerous delusion to develop. This is the growing belief that if the software layers of an IT system can handle hardware failures in the physical layer, power protection and power management become optional or even completely unnecessary. In reality, nothing could be further from the truth!
Irrespective of where resilience is achieved, power protection remains essential for many reasons. Properly implemented, it will, for example, condition the power received from supply source and will filter out transients and other fluctuations, thereby providing the IT hardware with invaluable protection against damage. Power protection also provides for controlled shutdown and smooth start up of servers, functions that are essential in systems where it is accepted that hardware will, from time to time, go down.
Well-designed power management also guards against “zombie servers”. These are machines that are functioning intermittently or erratically, but have not quite failed completely. Because their behaviour is unpredictable, zombie servers are a potent source of instability and data loss in IT installations. A good power management system will “fence” the zombie servers, isolating them from the rest of the installation, and then power them down. There’s another crucial reason why power protection and power management are essential in systems where resilience is provided at levels above the hardware level. For these higher level resilience strategies to work, the level or levels at which the resilience is provided must always be power aware – that is, they must always know the current status of the power supplies, and this information can only be provided by a power management system.
To see why the higher levels need to know about the power status, consider an installation in which resilience is achieved at the cloud/virtualisation layer, with a strategy that involves migrating virtual servers to remote hardware if a problem occurs.
If the mains supply fails, the UPS will support the local servers for, say, 15 minutes, which is more than adequate time for the migration of the virtual servers to take place. But if the power system has not informed the virtualisation manager that the installation is now running on battery power, how will the migration be initiated? Relying on manual intervention in such cases is a strategy fraught with peril!
It should now be clear that power protection and power management are at least as important with converged/virtualised infrastructure as they are in more traditional IT environments and that, in converged/virtualised environments, the power management system must be closely integrated with the virtualisation management software, to ensure fast, automatic response to faults. There is, however, another important issue to be considered when providing power in converged/virtualised environments, and that is load fluctuation.
New power usage optimisation techniques in IT hardware as well as on the cloud and virtualisation layers mean that the load on the power system can vary widely and rapidly. Although the variations are often cyclic, in some cases they can be instantaneous and erratic. Unless the power system has been designed with these variations in mind, the result is almost certain to be poor energy efficiency.
The reason is simple: UPS systems operate most efficiently when they are running at high load levels. For example, an 1100 kVA UPS operating close to full load may well deliver an efficiency of around 95%, but the same UPS operating at, say, 10% load, will probably struggle to achieve 85%. Fortunately, there is a solution to this problem: specify a system made up of several UPSs which share the load, and which are complemented by intelligent multi-UPS management system.
This software is designed to ensure that, at any given instant, only those UPSs needed to meet the current power demand are operational. The other modules are held in a standby state where they consume almost no power. The management software can, however, bring them back into service almost instantaneously – typically in less than two milliseconds – when the load increases. This very fast transition is invisible to the equipment powered by the UPSs.
This arrangement means that the load is concentrated on the minimum number of UPSs needed to meet it and that, as a consequence, these UPSs are well loaded and will therefore operate efficiently.
With the best hardware it is, in fact, possible to take this multi-UPS approach even further, by specifying UPSs that are themselves made up of modules that can be instantly transitioned between standby and operational mode.
Such an arrangement, which is usually described as a module management system allows the UPS power capacity to be accurately matched to the power demand over a very wide range of loadings, ensuring that high operating efficiency is achieved under all operating conditions, irrespective of load fluctuations.
Far from being unimportant or even unnecessary, effective power protection and power management systems are key ingredients for the successful implementation of converged/virtualised IT infrastructure. Correctly implemented, they protect the hardware against damage caused by supply transients, and they interact with the virtualisation management software to ensure fast and effective response to faults, thereby guaranteeing resilience.
To minimise operating costs and environmental impact, the power protection systems must, however, be designed to cope with the load fluctuations that are typically associated with converged/virtualised infrastructure, and to operate efficiently under all load conditions.
Proven and cost-effective power technology that meets all of these requirements is available now and is, indeed, already in widespread use around the world. Powering converged infrastructure doesn’t, therefore, need to be complicated or challenging, provided that you choose a supplier with the right experience and expertise.