AI infrastructure at the edge: the inversion of the "cloud-first" trend

By Giuseppe Leto, senior director IT systems business at Vertiv.

The acceleration of artificial intelligence (AI) adoption is testing the limits of established digital infrastructure. High-density workloads, latency-sensitive applications, and countless number of use cases are reshaping how data centre environments are planned, deployed, and maintained. 

The thesis is simple: the more powerful and compact AI infrastructure becomes, the more tightly connected their internal networks must be. Unlike typical cloud computing, which can be spread out across the globe, most large-scale AI training jobs rely on centralised data centres for greater efficiency.

Edge computing infrastructure is central to this trend. As AI becomes increasingly embedded into operational processes, the need for local compute to support inference or latency-sensitive tasks is paramount to enable this transformation to succeed. Enterprises data centre operators, therefore, must rethink how they design new or retrofit existing physical infrastructures to handle AI’s thermal, power, and connectivity demands, without compromising performance, uptime or carbon footprints.

AI is real and responds to practical use cases

AI, particularly inference at scale, places highly specific demands on infrastructure. These are not generic virtual machines or burstable workloads that can be comfortably shuffled between cloud availability zones.

AI models require predictable response times and consistent throughput. In domains like advanced graphics and digital twin where creating and operating photorealistic, real-time virtual replicas of physical objects and environments are paramount, milliseconds of delay can undermine the value of the system entirely or affect the deadline of a project. Large Language Models (LLMs) used for training, fine-tuning, and running inference for generative AI apps also depend on access to bandwidth and graphics processing until (GPU) level compute that few legacy environments can support without adaptation.

Designing for power density and stability

GPUs operate at significantly higher wattage than conventional central processing units (CPUs), which comes at a cost of significantly higher power requirements. According to Goldman Sachs, in 2027, AI server rack designs will require 50 times the power of the racks that power the internet today. And this figure is expected to rise as AI workloads increase. Today, rack densities of 30 to 50 kilowatts or more are becoming common in AI edge deployments, in place to cope with training models like ChatGPT that can consume more than 80 kW per rack.

GPU-driven data centres have a unique and demanding power delivery challenge. The spiky power consumption of these processors requires a new generation of uninterruptible power supply (UPS) systems, specifically designed to handle rapid, significant increases in demand - sometimes up to 150% of the nominal load. This need for high-power surge capability is critical to enable uninterrupted performance for AI and other intensive workloads. Such systems provide the necessary headroom to accommodate these unpredictable spikes, preventing system crashes and allowing continuous operation.

Even the Coolant Distribution Units (CDUs) require 24/7 power continuity for normal operations. A power outage to these units, even for a short duration, can lead to a thermal shutdown of the GPUs. Since modern GPUs generate substantial heat, the cooling system is their lifeline. Enabling the CDUs to remain operational during a power disruption is paramount to safeguarding the expensive GPU hardware and maintaining the continuous operation of the data centre's core processing capabilities.

Cooling for thermal extremes

The increasing power density per rack makes liquid cooling a necessity, not a luxury. Liquid cooling, once confined to hyperscale data centres and supercomputers, is now a mainstream solution for smaller, high-density environments. By transferring heat directly from the source, it enables higher rack densities and significantly improves heat capture. This approach reduces the reliance solely on established chilled air distribution, allowing for more efficient and effective cooling, while opening opportunities for waste heat reuse.

For those facilities where the air cooling capacity is insufficient to offset the amount of thermal load per rack that liquid cooling cannot dissipate, Rear Door Heat Exchangers (RDHXs) offer a simple and effective way to solve the issue. These units are an easy retrofit for existing data centre infrastructure, neutralising the heat impact of new AI hardware. By capturing heat at the rack level, RDHXs significantly reduce the burden on the existing air-cooling systems, allowing operators to scale their AI workloads without needing to completely overhaul their facility's cooling infrastructure.

Effective thermal management requires continuous environmental monitoring. Real-time data on area such as temperature, humidity is crucial for prompt reaction to any issues. Advanced monitoring and control systems can automatically adjust cooling parameters, or alert operators to potential problems, helping the cooling infrastructure to remain tuned to the workload profile and allowing immediate response to any unexpected scenarios. This helps to prevent thermal shutdowns and safeguard valuable hardware.

Connectivity under pressure

The "densified" network is a direct consequence of this hardware concentration. The GPUs in a single server or across multiple servers must communicate and share data constantly and at extremely high speeds to work together on a single task. This communication is known as "east-west traffic" (data moving between servers within a data centre). AI training clusters require a massive network fabric with ultra-low latency and incredibly high bandwidth to prevent the GPUs from being "starved" for data, which would waste their expensive compute power. Technologies like InfiniBand and high-speed Ethernet (400 Gb/s and 800 Gb/s) are essential for this.

The “single supercomputer” model: AI workloads, especially for training large models, are difficult to distribute as they function as a single, tightly coupled supercomputer sometimes referred to as. The thousands of GPUs in an AI data centre must work in parallel on a single task. This requires constant, real-time synchronisation and data exchange. It isn’t realistic to take a piece of the model or data, send it to a data centre in another city or country and expect it to work efficiently.

Latency is the Killer: The latency introduced by a wide-area network (WAN) is far too high for this kind of inter-processor communication. While a traditional cloud workload can be distributed geographically to reduce latency for end-users, an AI training job is fundamentally different. The milliseconds of latency over a long-distance connection would grind the training process to a halt, making it incredibly inefficient and impractical. This is why AI training is typically confined to a single, purpose-built data centre cluster.

Cloud is Built for Scalability and Distribution, not Cohesion: Traditionally the cloud architecture was designed for workload decentralisation, making it attractive to business owners who preferred elasticity and pay-as-you-go models over CAPEX. In contrast, an AI training job is a single, colossal workload that requires all available resources to be cohesively linked, which is why AI-centric data centres are often referred to as "AI factories."

Beyond "Cloud-First": The Edge Advantage 

For the last decade, the trend for enterprise IT has been "cloud-first." Companies were encouraged to migrate their applications, data, and infrastructure to public cloud providers like AWS, Azure, and Google Cloud. The benefits were clear: scalability, reduced capital expenditure, and simplified management.

However, the rise of AI is now creating a powerful counter trend. AI's unique infrastructure requirements are leading companies to invest in their own on-premises and "edge" computing capabilities, shifting investments away from a total reliance on the cloud.

The cost of training a large AI model - which requires a tightly coupled cluster of thousands of GPUs - in the public cloud can be substantial, due both to the high-performance hardware and the sheer volume of data transfer. A large enterprise or a tech company can find more cost-effective building their own dedicated AI cluster rather than renting the equivalent from a hyperscale operator. This is even more relevant with AI inference, the process of using a pre-trained model to make predictions or decisions.

The unique, non-distributable, and latency-sensitive nature of AI workloads, especially for inference, is forcing a decentralisation. Some enterprises are now building out their own high-performance infrastructure at their premises. While the cloud remains essential for general-purpose computing and some AI trainings, there is a growing shift away from purely "cloud-first" strategies, fostering a new age of investments in enterprise edge.

By Alexander Gittens, Utilities, Energy and Enterprise Sales Manager, Getac.ise Sales Manager,...
By Caroline Fanning, Chief Employee Success Officer at The Access Group.
By Nadeem Azhar, Founder and CEO of a Houston-based technology firm specializing in...
By Matt Tebay, Multi-cloud evangelist, OVHcloud.
Many UK businesses are still reeling from last year’s global IT outage that brought systems to a...
By Kyle Hauptfleisch, Chief Growth Officer, Daemon.
By Pascal Lemasson, AVP Business Development and Sales - Europe at MediaTek.