← Research
Datacenters & power/10 min

The Datacenter Buildout No One Budgeted For: Power, Cooling, and Interconnect as the Real Constraint

By WaferZeroPublished June 16, 2026
TL;DR
  • The binding constraint on AI is no longer chips; it is power, cooling and water, and the network fabric between racks.
  • AI racks draw 40 to 130+ kW versus 5 to 10 kW for traditional ones, and the limiting factor is grid interconnection, which can take years, pushing operators toward on-site and nuclear generation.
  • That heat forces a shift from air to liquid and immersion cooling; PUE and water usage become first-order siting constraints.
  • At cluster scale the interconnect is the memory wall again, one level up: a rising share of cost and lost performance lives in the network, not the chips.
  • Capacity now gets built where power and cooling exist, not where convenient, and some announced capacity will not arrive on schedule.

For a decade the limiting factor on computing was the chip. For AI at scale, it no longer is. The constraints that now decide whether a cluster gets built, and how big, are physical: megawatts from the grid, the cooling and water to carry away the heat, and the network fabric that wires thousands of chips together. The GPUs are almost the easy part.

Power is the new ceiling

A traditional server rack draws maybe 5 to 10 kilowatts. A rack of modern AI accelerators can draw 40 to over 130 kilowatts. A single large training cluster runs into the tens or hundreds of megawatts, and the campuses now being planned are measured in gigawatts, the scale of a nuclear plant. The hard part is not buying that power, it is getting it connected: grid interconnection queues in many regions stretch for years.

That is why operators are doing things that used to be unthinkable for a data center: signing deals for dedicated on-site generation, co-locating next to power plants, going behind the meter, and seriously discussing nuclear, including small modular reactors. Increasingly, power availability, not chip supply, decides where AI capacity can physically exist.

Cooling and water: where the heat goes

Every watt that goes into a chip comes out as heat, and there is now too much of it for air. Air cooling runs out somewhere around 20 to 30 kilowatts per rack; AI racks blow straight past that, which forces a shift to liquid cooling: direct-to-chip cold plates, rear-door heat exchangers, and full immersion. The efficiency of all this is captured by Power Usage Effectiveness (PUE), the ratio of total facility power to the power that actually reaches the IT equipment.

MetricLegacy data centerModern AI facility
Rack power5–10 kW40–130+ kW
Cooling methodAirDirect-to-chip liquid / immersion
PUE (total ÷ IT power)~1.5 or worse~1.1–1.2
WaterModestA real siting constraint (evaporative cooling)

Better PUE means less power wasted on cooling, but the other side of the ledger is water: evaporative cooling consumes large volumes of it, which turns local water supply into a siting and public-acceptance issue in a way chip procurement never was.

Interconnect: the fabric sets the cost

Training a frontier model is not one big computer; it is thousands of GPUs that must constantly exchange data, synchronising gradients with all-reduce operations on every step. The network that connects them, the switches, the optical transceivers, the topology, is a large and rising share of cluster cost and the main thing that determines whether you actually get the performance you paid for.

Where, and whether, capacity gets built

Stack these up and the order of constraints has inverted. It now runs power, then cooling and water, then land and permitting, then network, and only then chips.

Power (grid interconnection)binds firstCooling & waterbinds firstLand, permitting, constructionNetwork fabricChips
The constraint stack for AI capacity. Power and cooling now bind first; chip availability, once the headline, sits at the bottom.

The practical consequences: capacity gets built where cheap, available power and cooling exist, not where it is convenient, which reshapes the geography of AI toward hydro, gas, and nuclear. Lead times are measured in years because the binding inputs are physical infrastructure. And some of the capacity that gets announced will simply not arrive on schedule, because the power and cooling behind it were never actually secured. For anyone sizing the AI buildout, the chips are a distraction; follow the megawatts.

Sources
  1. [1]IEA, "Electricity 2024" and analysis of data-centre and AI electricity demand
  2. [2]Lawrence Berkeley National Laboratory, "2024 United States Data Center Energy Usage Report"
  3. [3]Uptime Institute, Power Usage Effectiveness (PUE) benchmarks

Have a question that needs this kind of depth? Get in touch.