AReno documentation¶

Local post-training and serving

Train and serve local LLMs with one CUDA-native loop.

AReno keeps rollout, reward scoring, inference, optimizer steps, and checkpoint I/O in one compact engine for SFT, DPO, GSPO, GRPO, PPO, and agentic RL workflows.

Get started Train a model

Start¶

Install

Build against your CUDA PyTorch environment.

pip install psutil
pip install flash-linear-attention
pip install -e . --no-build-isolation

Use ARENO_BUILD_EXT=0 for metadata-only docs or package checks on CPU-only machines.

Check

Verify the local runtime before training.

areno check
areno env --json

areno check reports common CUDA, PyTorch, extension, and platform setup issues with next steps.

flash-attn is optional unless you use the default --attn-backend flash path. Use --attn-backend native when you want to run without FlashAttention or when the local GPU is unsupported by FlashAttention.

Core workflows¶

01 Train

Run SFT, DPO, GSPO, GRPO, or PPO from the CLI with dataset loading, rollout, reward scoring, and checkpoint saving in one loop.

02 Serve

Start an OpenAI-compatible chat-completions server backed by the local AReno inference engine.

03 Customize

Use from areno import Trainer for custom rollout, reward, loss, and checkpoint loops.

04 Load models

Review the checkpoint families currently supported by AReno model adapters.

Training

Run a small GSPO smoke task.

areno train \
  --ckpt Qwen/Qwen3-0.6B \
  --dataset-path gsm8k:main \
  --dataset-loader-fn examples/math/dataset_loader.py \
  --reward-fn-path examples/math/math_verify_reward.py \
  --algo gspo \
  --tp-size 1 \
  --world-size 1 \
  --batch-size 1

Serving

Open a local chat-completions endpoint.

areno serve \
  --model-path /path/to/model \
  --tp-size 1 \
  --world-size 1 \
  --port 8000

Training and serving require a CUDA-capable NVIDIA GPU. CPU-only machines can run docs, packaging checks, and lightweight CPU tests, but cannot run the AReno training or serving engine.

Agentic rollout¶

Agentic RL

Collect trajectories through a local OpenAI-compatible proxy.

Agent functions call the local server, return explicit trajectory turns, and let AReno convert responses into completions, tokens, logprobs, rewards, and loss masks.

areno train \
  --agent-fn examples/agentic/tictactoe/run_agent.py \
  --reward-fn-path examples/agentic/tictactoe/reward.py \
  --algo gspo

DuelGrid is a browser-game demo with multi-action turns. Before GSPO/RLVR post-training, Gemma-E2B-it often moves back and forth without progress. After training, it learns to collect pickups, chase the user, attack when in range, and avoid trap tiles.

Train before	Reward	Train after

See examples/agentic/duelgrid for the rule engine, fixed-path dataset loader, reward function, OpenAI-compatible agent, and browser UI.

What AReno owns¶

Kernels

Fused CUDA paths in areno_accel for runtime hot paths.

Engine

Tensor-parallel workers, KV/cache layout, CUDA graph support, rollout state, scoring, optimizer steps, and checkpoint I/O.

Algorithms

SFT, DPO, GSPO, GRPO, PPO, and agentic rollouts implemented inside the project rather than delegated to a separate trainer framework.

Checkpoints

Hugging Face-compatible load/save adapters for supported model families.