Dataset loaders¶
--dataset-loader-fn points to a Python function that normalizes raw dataset
rows before the trainer sees them. The function shape is the same across
examples:
def load_training_dataset(dataset_path: str, *, default_loader, **_: object):
dataset = default_loader(dataset_path)
...
return normalized_rows
default_loader understands the same --dataset-path values as the CLI:
JSON/JSONL, Parquet, CSV/TSV, Arrow, datasets.save_to_disk(...) directories,
and Hugging Face dataset references. Loaders should keep tokenization out of the
dataset layer; trainers own tokenizer-specific rendering and token limits.
SFT¶
SFT always requires --dataset-loader-fn. The loader must return rows with
prompt and response keys:
def load_training_dataset(dataset_path: str, *, default_loader, **_: object) -> list[dict]:
rows = default_loader(dataset_path)
records = []
for row in rows:
record = dict(row)
records.append(
{
"prompt": f"Instruction: {record['instruction']}\nAnswer:",
"response": str(record["answer"]),
}
)
return records
For a concrete example, use --dataset-path yahma/alpaca-cleaned with
examples/sft/alpaca/dataset_loader.py.
DPO¶
DPO loaders should return prompt, chosen, and rejected. prompt
is the shared context, chosen is the preferred answer, and rejected is
the lower-ranked answer.
Prompt-based RL¶
GSPO, GRPO, and PPO prompt datasets should return prompt. Reward functions
may require additional fields such as solutions or task metadata, so loaders
usually preserve those fields while adding the canonical prompt. The math loader
in examples/math/dataset_loader.py follows this pattern.
Agentic RL¶
Agentic loaders also return prompt plus task metadata consumed by
run_agent.py and reward.py. Examples include
examples/agentic/coding/dataset_loader.py and
examples/agentic/shopping/dataset_loader.py.