TicTacToe agentic RL¶
TicTacToe is the smallest agentic AReno recipe. It exercises an agent function, the local OpenAI-compatible proxy, a task rule loop, and a reward function.
areno train \
--agent-fn examples/agentic/tictactoe/run_agent.py \
--reward-fn-path examples/agentic/tictactoe/reward.py \
--algo gspo
Use it when you want to learn the shape of an agentic task before moving to a larger environment.
Key adaptation points:
Replace the rule loop with your environment.
Keep the agent function responsible for external actions.
Return trajectory turns that AReno can tokenize and score.
Add reward diagnostics before increasing concurrency.
See Agentic rollout API for the agentic rollout API contract.