Methodology

How the rubric works

The Transparent AI Index grades labs on the transparency of their per-query energy and water disclosures. It does not grade environmental performance.

What this scorecard measures

A lab can publish figures showing high consumption and still earn an A. A lab can run an extremely efficient model and still earn an F if it publishes nothing. Disclosure is the precondition for accountability, comparison, and improvement. Until labs publish measurable, methodology-backed per-query figures, users have no way to make informed choices about the environmental cost of the queries they send, and policymakers have no defensible basis for regulation.

The framing is borrowed from nutrition labeling. A calorie count on a menu does not tell you whether a meal is healthy. It tells you the information exists, in a comparable form, so the diner can decide. That is the bar this scorecard applies to AI inference.

The rubric

CriterionWeightWhat earns full marks
Per-query figures published25%Specific energy (Wh) and water (mL) numbers per inference query, tied to a named, current model.
Methodology disclosed25%A technical document explains what is measured, what is excluded, the measurement boundary, and how figures were derived.
External audit20%Figures are independently audited or peer-reviewed by a credentialed third party.
Recency15%Disclosure covers the lab's current flagship model and was updated within the last 12 months.
Scope coverage15%Covers text, image, and video generation; reasoning vs standard modes; and acknowledges training, networking, and water-from-power-generation.

Grade thresholds

GradeThreshold
A85–100. Measured per-query figures, full methodology, recent, broad scope.
B70–84. Measured or rigorously modeled figures, but with material gaps in scope, recency, or audit.
C55–69. Some quantitative disclosure, but methodology incomplete or scope narrow.
D40–54. Marketing-grade claims only: a single number with no methodology, no model attribution, or no audit.
FBelow 40. No per-query figures of any kind. Corporate ESG reporting does not earn credit.

What counts as a disclosure

A disclosure must come from the lab itself and meet all of the following: it names a specific model; states quantitative figures in standard units (Wh, mL/L, gCO2e); defines the measurement boundary; and is dated and tied to a specific time window.

What this scorecard excludes

Training energy and water. Inference is what users directly drive. Training is a separate question. Corporate-level ESG reports. Per-query specificity is the bar. Hardware embodied carbon. Important, but not currently disclosed by any lab in a per-query form. Downstream device and network energy. Excluded for consistency with how labs themselves report.

Limitations

Methodologies across labs are not directly comparable. Google's median text prompt and Mistral's full-page generation measure different functional units. The grades reward the existence and quality of disclosure, not the headline number. Modeled estimates carry their own uncertainty and should be read as ranges. This is v1; the rubric will evolve as more labs disclose and as standards bodies converge.