An Operational Framework for AI Data Quality
Quality is not a final inspection — it is a system. The framework we use to make data quality measurable, auditable, and steadily improving across every program.
By Data Team
AI data quality is the degree to which a dataset reliably produces the intended model behavior — and it can only be managed if it is measured at every stage of production, not inspected at the end.
The five layers
1. People
Expert qualification before production: domain screening, calibration tasks, and probationary review. The cheapest quality intervention is not hiring the wrong annotator.
2. Definition
Taxonomies, rubrics, and guidelines with versioned change control. Most "annotator errors" are actually guideline ambiguities; treat guideline revisions as first-class deliverables.
3. Measurement
- Gold tasks seeded into production at a known rate
- Inter-annotator agreement tracked per rubric dimension, not as one blended number
- Drift monitoring — agreement and gold performance over time, per cohort
4. Review
Multi-layer review with explicit escalation paths and documented disagreement resolution. Consensus without documentation hides systematic error; the disagreement log is where rubric improvements come from.
5. Feedback
Client-in-the-loop calibration cycles and delivery QA reports that state measured quality, not asserted quality. Every delivery should answer: what was the acceptance rate, what failed, and what changed because of it.
Making it auditable
Quality claims need lineage: which guideline version, which annotator cohort, which review path produced each record. Versioned datasets plus annotation history make a quality audit a query, not an archaeology project.
The test of a real quality system
Ask for last month's QA report on any active program. A functioning system produces one as a by-product. A theatrical one needs to write it.