Reward function issues¶
Reward issues often look like training instability. Debug the reward function before changing algorithms.
Check:
The file passed to
--reward-fn-pathexportsreward_fn.The reward list length matches the completions list length.
Parsing handles empty, malformed, or unexpected completions.
Scores match a hand-checked example.
Logged examples include enough context to explain wrong scores.
If rewards are always zero, inspect answer parsing first. If rewards are always one, verify the checker actually reads the completion.
See Reward functions and Reward function API.