Infrastructure & Planning
From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning
This paper presents a novel compositional framework for generating accurate LLM inference power traces, crucial for datacenter and grid planning. It decouples workload dynamics from configuration-specific power behavior, allowing for generalization across various traffic conditions, hardware platforms, and serving settings. The model reproduces measured energy and temporal structure with high fidelity, significantly outperforming traditional TDP-based and mean-power abstractions. Case studies demonstrate its value in provisioning, oversubscription, and utility-facing load characterization, revealing previously hidden headroom and more accurate system-level insights.
Executive Impact & Key Findings
Datacenter operators and electrical utilities require precise power traces at different scales for provisioning, facility management, and grid integration. Existing models fail to capture the rapid GPU state transitions (prefill, decode, idle) inherent in LLM inference, leading to inaccurate demand forecasts, missed optimization opportunities, and conservative infrastructure sizing. This necessitates a new modeling approach that preserves device-level dynamics while scaling to facility-level insights.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Enterprise Process Flow
The framework operates in a clear, modular pipeline, transforming raw measurement data into actionable facility-scale power profiles. This allows for flexible scenario generation and aggregation.
Traditional TDP-based provisioning overstates required interconnection capacity by roughly 60%. Our traces provide a more realistic peak demand of 0.75 MW compared to TDP's 1.19 MW for the same 240-server facility.
| Feature | Our Framework | LUT-based Baseline |
|---|---|---|
| Energy Error (Median |ΔΕ|) |
|
|
| Temporal Structure (ACF R²) |
|
|
| Distributional Agreement (KS Statistic) |
|
|
Unlocking Oversubscription Headroom
Challenge: Under nameplate provisioning, a 600 kW row can only host 23 racks, leaving significant row capacity unused despite lower actual peak loads.
Solution: By using our power traces under a production workload, the same row can accommodate 57 racks while remaining below the 600 kW limit (observed peak ~580 kW).
Impact: This represents more than double the rack density compared to TDP-based provisioning, revealing substantial hidden headroom and enabling more efficient infrastructure utilization.
Advanced ROI Calculator
Understand the potential cost savings and operational efficiency gains by adopting our advanced power modeling for LLM inference infrastructure. Our calculator provides a personalized estimate based on your enterprise's operational profile.
Implementation Roadmap
Our implementation roadmap outlines the phased approach to integrating the compositional power trace generation framework into your existing infrastructure planning workflows. Each phase is designed for seamless adoption and measurable impact.
Phase 1: Data Collection & Model Training
Gather existing GPU power traces and workload features. Train configuration-specific state classifiers and power models using our framework. This phase establishes the baseline and customizes the model to your environment.
Phase 2: Workload Simulation & Trace Synthesis
Integrate your request arrival schedules and length distributions into our workload simulator. Generate high-fidelity, server-level power traces for various scenarios, including unseen traffic conditions and model mixes.
Phase 3: Datacenter-Scale Aggregation & Planning Integration
Aggregate server-level traces to rack, row, and facility scales. Integrate the resulting load profiles into your existing provisioning, oversubscription, and grid planning workflows to unlock new insights and optimize resource allocation.
Phase 4: Continuous Optimization & Scenario Analysis
Leverage the framework for ongoing scenario planning, evaluating hardware refreshes, model mix changes, and traffic growth. Continuously refine models with new data to maintain accuracy and adapt to evolving demands.
Ready to Optimize Your AI Infrastructure?
Schedule a personalized strategy session to explore how compositional power trace generation can revolutionize your datacenter planning and achieve significant cost savings.