How Can Data Center Managers Handle Explosive Growth Driven By AI and ML?

By Marc Caiola – nVent Vice President of Global Data Solutions.

Growing focus on generative Artificial Intelligence (AI) and Machine Learning (ML) has brought AI into the forefront of public conversation, but the capabilities of AI are vast. Today, we see large enterprises launching their own AI solutions and figuring out how to successfully integrate AI into business is top of mind for management. Supply chain management, research and analysis, and product marketing are just a few examples of how AI will be adopted to drive value for businesses and customers. AI will also see increased adoption in the healthcare, eMobility and energy generation and power industries as technology improves and key stakeholders grow more comfortable with its adoption.

All these factors are driving an increase in demand for the AI industry, which is expected to grow significantly in the coming decade.

New Technology Means New Demands

As data center managers know, it takes an extreme amount of data processing to deliver the results that users of AI and ML applications have come to expect and these applications are driven by high-performance chips on the cutting edge of IT development. 

These advanced chips use a lot of power to run and produce more heat than less sophisticated applications. Data center managers have to deal with these high heat loads while still being able to scale their operations to meet demand. Scaling capabilities cannot always depend on more physical space—data center managers and engineers often have to solve the technical problem of fitting more and hotter servers into the same spaces. They also have to maintain 24/7 uptime: the needs of AI applications will not pause for a data center renovation.

Additionally, the industry is facing increased scrutiny over power use, so data center managers need to be especially conscious about how they are using electricity. Sustainability has always been a conversation in the data center industry, but this increased attention will create even more conversation around PUE and power management. 

A Shift in Cooling Approach 

Next-generation chips and other AI infrastructure need more than traditional cooling methods to keep them from overheating. Data centers may implement solutions that remove extra heat by increasing air volume or reducing air inlet temperatures, which can be inefficient and costly to run. When air cooling systems have to work harder to maintain optimal temperatures, facilities can also face equipment breakdowns, unexpected shutdowns and increased energy costs. For many data centers, using liquid cooling technologies can offer better performance while reducing energy use and helping data centers operate more sustainably. For the most advanced applications, liquid cooling is the only possible option. By using liquid cooling technologies in the right way, data center managers can greatly improve Power Usage Effectiveness (PUE), even in applications where they are using next-generation IT. 

Liquid cooling can help data centers increase capacity while maintaining efficient space and energy use. It also can offer a favorable return on investment and lower the total cost of ownership for data center facilities. Liquid cooling systems provide an effective solution for achieving required temperature parameters and reducing the energy consumption of cooling systems. Liquid provides a much greater heat transfer capacity than air. This helps liquid cooling increase power usage effectiveness, managing heat loads effectively, reducing energy costs and contributing to environmental sustainability.    

Solutions at Scale

Liquid cooling does not have to be a comprehensive solution. Data centers can choose to cool a single rack or a few racks that run AI and machine learning applications without having to build entire data halls that are liquid cooled and support many racks of equipment that use high performance computing solutions.  However, when applying these partial solutions, it is vital to understand future business plans. Using specific cooling solutions for a particular problem is useful, but because of cost, energy efficiency and other factors, a solution for one problem may not work for another. As with all data center projects, different challenges need different solutions, and a universal approach rarely works. 

With the growth in demand for high performance computing driven by the expansion of AI, data center managers need to have a plan in place to scale their cooling solutions. This may mean planning next generation data centers to be fully liquid cooled or exploring hybrid liquid to air solutions such as rear door cooling or direct-to-chip CDUs that bring liquid cooling to the rack and chip level while operating within air-cooled infrastructure. 

The biggest advantage that planning for the future and understanding IT workloads will bring is the realization that almost all potential cooling solutions can be built out in combinations, allowing data center managers to match their power and cooling capabilities with shifting demands. The key to sustainable growth is a variety of flexible options for supporting the next-generation equipment. Liquid cooling technologies help drive that flexibility.

Power Management for AI 

Power distribution is another critical technology for managing AI and ML workloads. Smart power distribution units (PDUs) are equipped with technology to distribute and monitor power usage to multiple devices within a data center and provide alerts in the event of power surges or other issues.  

The remote monitoring and control capabilities of smart PDUs can increase energy efficiency and reduce the risk of downtime. Input metering, for instance, allows power coming into a PDU to be remotely monitored, reducing the risk of overloading PDUs and causing breakers to trip. This monitoring can also help ensure that PDUs are not getting too close to breaker levels, allowing data center operators to remotely mitigate potential issues before they occur.  

Some PDUs also are equipped with outlet metering, where monitoring and control technology is applied not only to the PDU level but at the level of each individual outlet/power connection on the PDU. This technology can help operators better understand specific device power usage and compare efficiencies between different technologies. It can also identify underutilized or “zombie” equipment that is not in use, but still drawing significant power. Being able to remotely identify this equipment and turn it off allows data center managers to make sure they only are using the power that they need.

The opportunity for the data center industry brought about by the growth of AI and ML comes with challenges. By leveraging the right cooling and power technologies, data center managers can improve performance, drive sustainability and scale operations appropriately to meet the growing needs of their customers. 

By Stewart Laing, CEO, Asanti Data Centres.
By Aitor Zabalegui, Senior Principal Application Engineer at Cadence.
By Alex Mariage, Regional Director at BCS.
By Jamie Cameron, Associate Director, Cundall.
With data centres now deemed as critical infrastructure, organisations must evolve beyond...
By Michael Crook, Data Center Market Development Manager, Corning Optical Communications.
In the next five years, Nigeria's data centre industry is set to explode, doubling its capacity...
Ben hadn’t considered a career in the Data Centre industry until he saw the advert by BCS on the...