The summer of 2018 came searingly close to becoming the hottest summer on record, with persistent high temperatures putting pressure on data centres across the UK. High temperatures are always something to monitor, but when your cooling is running flat out for days on end, it can start to become a serious worry. Is it going to hold out? What if servers start shutting down? What can I afford to shutdown to lower temperatures? What if part of the HVAC goes offline?
It has highlighted more than ever the importance of good maintenance regimes in the data centre. We have spoken to many DC managers over the summer, who have been watching the environmental data points of their data centres as if they were expecting their lottery numbers to come up. It is a nail-biting experience, that for many spanned weeks on end.
Many data centres lack the level of redundancy they need: sometimes this is because of cost, and in other cases building limitations make it seem impossible. But the biggest worry DC managers we spoke with had, was whether their HVAC was performing as well as it should. Many had a dim and distant memory of its last maintenance service!
When creating a preventative maintenance plan, it is important that you have detailed information on the actual installed equipment and maintain it as per the manufacturer’s recommendations. Keeping your data centre in top condition is crucial to the 24 /7 availability of you IT systems and helps assure your infrastructure is not the cause of costly downtime. But it isn’t just about downtime, long-term monitoring and maintenance leads to expanded equipment lifespan, cost-savings and greater energy efficiency.
At a bare minimum every data centre should have the following in place:
· Create a maintenance plan as per the manufacturer’s recommendations of each piece of equipment.
· Keep a current inventory, forward planned budget and list of maintenance priorities.
· Put cleanliness standards in place. These should be daily activities, and be deep routed in the way every member of staff works in the data centre.
· Maintain detailed records of the completion of maintenance processes and cycles. These are critical to proving good practice when auditors attend to inspect the maintenance regimes. No records means no evidence, regardless of how to the DC looks!
· Keep a record of the emergency call out numbers on the wall within the data centre and ensure you have your Pre-planned Preventative Maintenance (PPM) contract number to hand.
One of the most forgotten aspects of maintenance regimes is training. Sure, it is necessary to make sure staff undertake training when new equipment is installed, or upgrades applied to software applications, but what about beyond that point? Some staff may go through refresher courses, but the reality is that in many cases the staff attrition in an IT department means that team members come and go far more regularly than the hardware or software configuration changes in a DC. This is particularly true when it comes to M&E equipment, where the configuration and processes defined could easily be a decade old.
In periods of persistently hot weather, where M&E equipment may be operating near its peak capacity for long periods of time, where you are wanting to run equipment on rotation, it is critical that all team members have a strong understanding of the equipment and how to manage those situations. What should they do if an alarm sounds relating to a condenser? If it becomes necessary to shut down certain servers, which should be chosen and how should that be managed and communicated across the organisation? Do all team members understand how to switch services, to the back-up mirror data centre?
One of the ways companies overcome these challenges is to make M&E and company process training related to HVAC a regular fixture in their maintenance regime. An example might be that twice a year every member of staff is taken through a training course by the data centre maintenance provider. In an ideal situation that company should also help you develop and run through scenarios that could be encountered so you can test the understanding of the team, and run exercises.
Everything points to us seeing more of these prolonged spells of hot weather. Maybe you got away with it this year, even though you were nervous; Can you afford to find yourself in a situation where critical corporate infrastructure needs shutting down, or is at the point of failure where it will shut itself down? Make summer 20190 the one that you are ready for, and use the next six months to ensure your maintenance regimes and training are in excellent shape. Then you can spend August focusing on where to take the team out to lunch, rather than on whether the CEO is going to eat you for breakfast.