Ensuring data centre cooling in a power outage

Recent data centre trends such as increasing power density, warmer supply temperatures, the “right-sizing” of cooling equipment, and the use of air containment could significantly reduce the amount of time available for continued IT operation following a loss of cooling. Paul Lin, Simon Zhang and Jim VanGilder of Schneider Electric discuss the primary factors that affect transient temperature rise and provides practical strategies to manage cooling during power outages.

  • 11 years ago Posted in

Data centre IT equipment is typically protected by uninterruptible power supplies (UPS) until generators come on-line following the loss of utility power. However, cooling system components such as CRAC (Computer Room Air Conditioner) or CRAH (Computer Room Air Handler) fans, chilled water pumps, and chillers (and associated cooling towers or dry coolers) are typically not connected to the UPS and may not even be connected to the genset. Consequently, data centre supply air temperature may rise quickly following a power failure.

While much attention is devoted to data centre cooling system design, most of that effort is aimed at improving the efficiency and reliability of its normal operation, with little attention paid to emergency operating conditions. A recently developed modelling tool makes it easy to estimate data centre temperatures following the loss of cooling for various facility architectures, back-up power connectivity choices, and, when applicable, chilled-water (thermal) storage volumes.

Meanwhile, planning for a power-outage is becoming more critical as data centre professionals follow industry trends like “right-sizing” cooling capacity, increasing rack power density, adopting air containment, and increasing supply air temperatures. The latter trend is driven in part by ASHRAE’s recently revised Thermal Guidelines which allows warmer data centre air temperatures than previously recommended. All of these trends reduce the window of time for safe and reliable operation following a power outage.

Factors affecting the rate of data centre heating
Essentially all of the electrical power consumed in a data centre is transformed into heat and the heating rate is balanced by the rate of cooling. During a cooling failure, the heating rate is balanced instead by the rate at which heat is absorbed by data centre air, IT equipment, and the building envelope. For facilities cooled by chilled-water CRAHs, water in the piping system and any storage tanks also absorbs heat assuming the chilled water can be circulated using backup power.

For a fixed IT load, the larger the volume of the data centre, the slower the rate at which temperatures will rise during a cooling outage. However, as more IT equipment is added, the air volume and building envelope plays a diminishing role in moderating the rate of heating while the thermal mass of the IT equipment itself becomes more important. Somewhat counter-intuitively, even a hot operating server contributes thermal mass.

Four Strategies for slowing the rate of heating
Despite the challenges provided by recent data centre trends, it is possible to design the cooling system for any facility to allow for long runtimes on emergency power.

1. Maintain adequate reserve cooling capacity
The industry trend of “right-sizing” cooling makes sense for normal operating conditions, but having even marginally more cooling capacity than the load can greatly increase the amount of time required to cool a facility that has become too hot. The key to increased cooling system efficiency is to scale the bulk cooling (i.e. chillers) and cooling distribution (i.e. CRAH units) as the IT load increases. This allows for adequate reserve cooling while improving data centre efficiency.

2. Connect cooling to equipment back-up power
Immediately after the cooling failure, but before the facility’s thermal mass (i.e., walls, plenums, servers, etc.) can absorb any significant heat, all of the IT power will simply heat the air and the maximum rate of temperature rise could easily be 5°C/ minute or more. Unless CRAH fans and pumps are on UPS and/or the data centre is very lightly loaded, the initial temperature spike will almost certainly violate thermal guidelines.

In a lightly-loaded facility (i.e. 20% load), placing only CRAH or CRAC fans on UPS until the generator starts, helps maintain proper cooling airflow, limiting recirculation from the IT exhaust to the IT inlet and helps to transfer heat to the pre-cooled thermal mass in the facility. Placing pumps (in addition to CRAH or CRAC fans) on UPS is more effective at reducing the initial temperature spike before the generator starts.

3. Use equipment with shorter restart times
The chiller control system is typically able to ride through a power outage which lasts less than a quarter of a cycle (5 ms for a 50 Hz system and 4 ms for a 60Hz system). Power outages longer than this typically require a restart when power is restored (from the utility or generator) - typically 10-15 minutes.

Advances in chiller technology have reduced the restart time to 4-5 minutes, important for the initial power outage as well as the momentary brownout (100 ms to 1s) when the ATS (automatic transfer switch) transfers the power supply from generator back to the utility power.

Higher-cost quick-start chillers are helpful in virtually all applications and in lower density data centres may keep temperatures entirely within acceptable limits. A balance between first cost and operational cost should be found between the chiller type and the importance of emergency operation after less-expensive options have been investigated.

4. Use thermal storage to ride out chiller restart time
For chilled-water systems, additional chilled-water storage can be utilized until the chiller is restarted. If chilled water pumps and CRAH fans are on UPS in a chilled-water system, cooling can be provided with very little departure from normal operating conditions provided the storage tank is adequately sized.

To find out more about predictive models and design strategies to ensure continued reliable data centre operation in a power outage,
or ample time for to power down the IT load, please download Schneider Electric white paper 179 “Data Center Temperature
Rise During a Cooling System Outage”, available from
www.apc.com/whitepapers