v0.2

LCOI calculator

Levelized cost of inference per million tokens. Pick a hardware, model, and region preset to see a plausible $/M-tokens figure split into GPU amortisation, electricity, and cooling. Every numeric field is overridable — presets only fill defaults.

Configuration

Hardware

Market is $35–40k for SXM units; PCIe variants trade $25–30k. Using $35k as a defensible midpoint for the SXM form factor.

Model

3,000 tok/s prefill / 310 tok/s decode on NVIDIA H100 SXM.

Region

EIA Electricity Monthly Update (industrial forecast, 2025)

GPU price (USD)

Electricity ($/kWh)

Cluster size (GPUs)

~15% inter-node overhead.

Utilisation — 60%

Share of wall-clock time the GPU is producing tokens.

Avg prompt tokens

26% of GPU time on prefill. Affects sessions/yr and cost/session — not the $/M token price.

Avg output tokens

23% of tokens by count. Changing token counts shifts cost/session, not $/M. See Advanced to move the per-token price.

Cost per million tokens

$0.91

Blended — cost allocated by GPU time (26% prefill / 74% decode).

in$0.30out$2.93

CapExFacilityElectricityCoolingOpEx

Cost breakdown

Component	$/yr	Share
GPU amortisation	$331,098	70.8%
Facility overhead	$65,190	13.9%
Electricity	$6,728	1.4%
Cooling	$2,691	0.6%
OpEx	$62,000	13.3%
Total	$467,707	100%

OpEx: fixed $30,000 + marginal $32,000.

Annual input tokens: 395.6B
Annual output tokens: 118.7B
Cost allocation in / out: $119,826 / $347,881
Combined tok/s / GPU: 999
Annual sessions: 395.6M
Cost per session: $0.0012
Utilisation peak / avg: 60% / 60%

Open in Workflow →Analyse in Prompt Calculator →

Related Resources: Read the main insights of this calculator and see assumptions for sources, dates, and methodology caveats.

← Back to home