inferecon.com
THE ECONOMICS OF INFERENCE

v0.2

LCOI calculator

Levelized cost of inference per million tokens. Pick a hardware, model, and region preset to see a plausible $/M-tokens figure split into GPU amortisation, electricity, and cooling. Every numeric field is overridable — presets only fill defaults.

Configuration

Market is $35–40k for SXM units; PCIe variants trade $25–30k. Using $35k as a defensible midpoint for the SXM form factor.

3,000 tok/s prefill / 310 tok/s decode on NVIDIA H100 SXM.

EIA Electricity Monthly Update (industrial forecast, 2025)

~15% inter-node overhead.

Share of wall-clock time the GPU is producing tokens.

26% of GPU time on prefill. Affects sessions/yr and cost/session — not the $/M token price.

23% of tokens by count. Changing token counts shifts cost/session, not $/M. See Advanced to move the per-token price.

Cost per million tokens
$0.91

Blended — cost allocated by GPU time (26% prefill / 74% decode).

in$0.30out$2.93
CapExFacilityElectricityCoolingOpEx

Cost breakdown

Component$/yrShare
GPU amortisation$331,09870.8%
Facility overhead$65,19013.9%
Electricity$6,7281.4%
Cooling$2,6910.6%
OpEx$62,00013.3%
Total$467,707100%

OpEx: fixed $30,000 + marginal $32,000.

Annual input tokens
395.6B
Annual output tokens
118.7B
Cost allocation in / out
$119,826 / $347,881
Combined tok/s / GPU
999
Annual sessions
395.6M
Cost per session
$0.0012
Utilisation peak / avg
60% / 60%

Related Resources: Read the main insights of this calculator and see assumptions for sources, dates, and methodology caveats.

← Back to home