Guides

Practical explainers for designing AI data programs and evaluation loops.

7 resources

Guide to Agentic AI Data

A research-backed guide to agent tasks, golden trajectories, tool-use logs, verifiers, artifacts, safety, and system-level evaluation.

Read guide

Guide

Guide to AI Data Quality

A practical guide to fit-for-purpose AI data quality, lifecycle controls, ISO/IEC 5259, documentation, metrics, lineage, and monitoring.

Read guide

Guide

Guide to Frontier Alignment Data

A research-backed guide to SFT, preference, critique, verifier-backed reasoning, and safety data for frontier model post-training and evaluation.

Read guide

Guide

Guide to Human-in-the-Loop Evaluation

A research-backed guide to human evaluation roles, rubrics, calibration, disagreement, adjudication, LLM judges, sampling, and governance.

Read guide

Guide

Guide to Model Evaluation

A research-backed guide to AI evaluation scope, private benchmark design, scoring, contamination controls, and continuous release testing.

Read guide

Guide

Guide to Multimodal Data Pipelines

A research-backed guide to sourcing, schema, grounding, annotation, QA, long-context evaluation, rights, and delivery for multimodal AI data.

Read guide

Guide

Guide to Physical AI and Robotics Data

A research-backed guide to robotics task design, sensor synchronization, calibration, demonstrations, teleoperation, episode QA, formats, and evaluation.

Read guide