WhitepaperQuality

An Operational Framework for AI Data Quality

Quality is not a final inspection — it is a system. The framework we use to make data quality measurable, auditable, and steadily improving across every program.

By Data Team

AI data quality is the degree to which a dataset reliably produces the intended model behavior — and it can only be managed if it is measured at every stage of production, not inspected at the end.

The five layers

1. People

Expert qualification before production: domain screening, calibration tasks, and probationary review. The cheapest quality intervention is not hiring the wrong annotator.

2. Definition

Taxonomies, rubrics, and guidelines with versioned change control. Most "annotator errors" are actually guideline ambiguities; treat guideline revisions as first-class deliverables.

3. Measurement

  • Gold tasks seeded into production at a known rate
  • Inter-annotator agreement tracked per rubric dimension, not as one blended number
  • Drift monitoring — agreement and gold performance over time, per cohort

4. Review

Multi-layer review with explicit escalation paths and documented disagreement resolution. Consensus without documentation hides systematic error; the disagreement log is where rubric improvements come from.

5. Feedback

Client-in-the-loop calibration cycles and delivery QA reports that state measured quality, not asserted quality. Every delivery should answer: what was the acceptance rate, what failed, and what changed because of it.

Making it auditable

Quality claims need lineage: which guideline version, which annotator cohort, which review path produced each record. Versioned datasets plus annotation history make a quality audit a query, not an archaeology project.

The test of a real quality system

Ask for last month's QA report on any active program. A functioning system produces one as a by-product. A theatrical one needs to write it.