CHOOSING A DATA CENTRE can be a challenge for any business looking for a secure and reliable home for its mission critical information. As reputation is of paramount importance to any organisation, there can be no margin for error or downtime. Decision makers must therefore focus their attention on mitigating the risks to their business.
However, the potential threats to a data centre are numerous and while 100 percent reliability, 100 percent security and 100 percent accessibility are ideal goals, frankly they are likely to be unaffordable for the majority of businesses. Therefore, a certain degree of compromise is required and therein lies the problem – with compromise comes risk. An inappropriate power supply and ineffective cooling systems are among the most common risks that are inherent in every data centre’s design – together with poor energy efficiency and an insecure location, to name a few. As a result, it is vital that buyers know what to look out for when selecting a data centre facility and provider, based on their organisation’s tolerance to risk.
The risk of an inappropriate power supply
Power interruption ranks as one of the biggest threats to a data centre, so a facility’s level of power assurance is vital. Power must come from reliable sources, such as major sub-stations rather than smaller, non-diversely connected sites. Organisations requiring higher reliability should be looking for facilities with two, rather than one, supplying sub-stations – with each providing a diversely routed supply cable to the data centre – as this ensures greater reliability. Within the data centre itself, supply path diversity and equipment redundancy is essential for effective risk mitigation. The Uptime Institute’s ‘Tier Certifications’ are widely used within the industry and provide appropriate guidance to those making this all-important decision. Higher tiers demand increasing levels of fault tolerance, but, are of course more expensive. A Tier II data centre may meet the demands of those with a relatively high tolerance to risk as these facilities provide a single, non-redundant path to serve the IT systems. However, at the more risk-averse end of the spectrum of data centre buyers, Tier III specifications insist on supply diversity throughout, providing much higher levels of availability.
Indeed, when low risk is a must, a Tier III data centre is an appropriate fit. In such a facility, the interruption of power from the local utility is an expected operational condition and the site will be fully prepared for such an eventuality.
Estimates of the cost of facility downtime range from low thousands to millions of pounds per hour; however, this of course depends on the business and sector in question. Essentially, establishing your organisation’s tolerance to risk and downtime is crucial in determining the appropriate tier level. If financial performance is the key metric, simply multiplying the cost of downtime per hour by the number of hours of downtime each tier permits may be a helpful guide. The reputational damage of downtime is, of course, harder to calculate in real-terms and can ultimately prove far more costly.
Put simply, a verified Tier III data centre guarantees to deliver a minimum 99.982% availability, a figure which works out at about 1.6 hours of unplanned downtime per year. For comparison, Tier II facilities guarantee 99.749% availability, which equates to around 22 hours of unplanned downtime per year.
Inadequate cooling – one of the most significant risks to a data centre
Every single kilowatt of power consumed by IT equipment creates a kilowatt of heat and modern data centres must be capable of neutralising this. Effective cooling is essential; however, cooling systems are under an increasing amount of pressure, as rack power densities increase.
ASHRAE’s TC9.9 - 2008 guidelines recommend that equipment is kept between 18 and 27 degrees centigrade, with a maximum humidity level of 60 percent. Failure to maintain the correct operating environment for IT equipment directly increases the risk of equipment failure. An appropriate level of supply and equipment redundancy will reduce the chances of downtime occurring due to cooling system failures. At the lower end of the redundancy scale, even cooling equipment maintenance can cause unexpected downtime. As a result, higher levels of redundancy to equipment and supply path may help to mitigate such risk.
Another factor to consider is the parallel redundancy offered by a facility, known as N+1 protection. A Tier III data centre will require N+1 redundancy to mechanical plant and computer room air conditioning (CRAC) units, meaning that cooling systems will be supported by generators in the event of a power failure. A higher tier facility can also be expected to have better temperature control, due to superior data centre design and information systems. Furthermore, higher tiered facilities will also have in place multiple distribution paths for electricity and the relevant cooling medium, providing additional resilience.
How to mitigate the risks
Power
A Tier I data centre has a basic feed from the local power distribution company with a single non-redundant power supply path and components throughout the facility. This infrastructure protects customers from the potential disruption caused by an interruption of the mains supply, which is usually achieved through battery backup until generators start up. As a result, the facility is vulnerable to disruption during planned maintenance and unplanned outages of individual systems as well as from operational or human errors. In a Tier I environment, the failure of any component in the power supply chain will affect the critical environment.
By comparison, a Tier III facility offers a ‘concurrently maintainable’ infrastructure with redundant components and supply paths throughout. ‘Concurrently maintainable’ means that any component in the supply path can be taken down for planned maintenance without this having an impact on the supply of power for cooling computer racks. In a Tier III data centre, generators are the primary source of power and must be rated to run continuously at maximum load. Power from the mains is deemed to be an unreliable source in a Tier III site and is only used because of its economic advantage over generator power.
A Tier III site will also enjoy two separate mains supplies from different sources – usually sub-stations – and both supplies will be conditioned and delivered to every critical system in the data centre, including customer racks. Every critical system will be protected by a backup component. All core capacity components of the infrastructure such as, transformers, generators, chillers, CRAC units and all power and coolant distribution paths will all have backup systems in place in a Tier III facility.
Cooling
As with power, cooling in a basic Tier I data centre is provided by a single source of environmental conditioning, with a single distribution path. Tier I cooling systems are particularly vulnerable to failure of individual components, including pumps and pipes. During planned maintenance, Tier I data centres usually have to cease operations for a time, otherwise the provider will continue operations at significantly increased risk. Within a Tier I data centre, a buyer might expect only rudimentary monitoring of temperature and humidity to take place.
A Tier III data centre, however, applies N+1 design principles to cooling components and supply paths too. So, for example, in a water cooled data centre, you can expect to see at least one backup chiller and one backup CRAC unit for a small cluster of similar units depending on size and area to be covered.
Within a Tier III facility, you may also find that critical elements of the cooling infrastructure, such as pumps and fans, are supported by UPS power as well as by generators, providing added protection in the event of an outage. Selecting a data centre can certainly be a tricky business. Each and every buyer should be armed with the knowledge of how to compare and contrast commercial facilities and assess how prospective providers approach the potential pitfalls inherent in data centre design. In order to assist IT professionals making this decision, these eBooks set out the most common data centre ‘sins’ – with the first explaining how to recognise them and the second providing expert guidance on how to mitigate them – based on an organisation’s individual appetite for risk or downtime.