Observability¶
AReno exposes training state through plain logs, the per-step train_stats
dictionary, TensorBoard scalar metrics, and optional agentic trajectory
diagnostics. The implementation is intentionally local and lightweight:
metrics are recorded by areno.api.metrics and the CLI exposes the output
directory through --metrics-log-dir. AReno does not currently provide a
built-in wandb integration.
Console logs¶
The trainer logs a compact lifecycle for every epoch and step. For rollout
algorithms, areno.api.trainers.policy_only.PolicyOnlyTrainer emits:
epoch=<n> stage=epoch_startandepoch=<n> stage=epoch_end.role=policy stage=rollout_startandstage=rollout_endaround sampling or agentic execution.metric=reward_meanafter rewards are computed.metric=rollout_logprob_meanwhen rollout logprobs are available.role=policy stage=train_startandstage=train_endaround the optimizer step.train_stats={...}with the scalar dictionary returned by the backend and loss function.
The local AReno backend also logs step timing from
areno.api.backend.areno.backend:
time rollout=213.929721 train=17.327612 total=231.297132
Those numbers mean the backend measured roughly 214 seconds in rollout work,
17 seconds in the training step, and 231 seconds end-to-end for that step. The
same values are copied into train_stats as step_rollout_time_s,
step_train_time_s, and step_e2e_time_s.
During rollout, areno.engine.inference logs decode progress per data
parallel shard:
rollout decode progress: dp=0/2 active=6 cuda_graph=True tokens_per_second=60.8
active is the number of currently scheduled decode requests on that shard
at the time of the progress log. In agentic runs it can dip to zero while the
external agent code is executing tools, tests, sleeps, background commands, or
other non-model work before asking the OpenAI-compatible proxy for the next
model response.
train_stats¶
train_stats is the easiest place to inspect one completed optimizer step.
It is logged as a Python dictionary and also passed to TensorBoard by
areno.api.metrics.MetricsRecorder when metrics recording is enabled.
Common fields include:
Field |
Meaning |
|---|---|
|
Loss scalars returned by the selected loss function. |
|
Mean policy advantage over trainable response tokens. |
|
Mean response-token count for rows used in the step. |
|
Mean rollout-time logprob over trainable response tokens. |
|
Mean current-policy logprob during the training forward pass. |
|
Difference between rollout and train logprobs, useful for stale-policy or masking debugging. |
|
Policy-ratio diagnostics used by rollout policy losses such as GSPO, GRPO, and PPO. |
|
Global gradient norm after clipping/accounting. |
|
Fraction of parameter-gradient entries that are zero/non-zero. |
|
Current optimizer learning rate. |
|
Per-step wall-clock timing from the local backend. |
For example, if a debugging log shows:
metric=reward_mean value=0.125000
train_stats={'loss': 0.0, 'advantage_mean': 0.0, 'response_len': 912.5625,
'rollout_logprobs_mean': -0.17456013709306717,
'train_logprobs_mean': -0.2500366196036339,
'logp_diff_mean': 0.07547648251056671,
'step_rollout_time_s': 213.92972119152546,
'step_train_time_s': 17.327912725508213}
read it as: the sampled batch achieved low positive reward on average, the
training rows were long, rollout dominated wall time, and the current policy
assigned lower logprob than the rollout policy on average. loss can print
as 0.0 for some policy-gradient batches when normalized advantages cancel
in the scalar value at the current ratio, while gradients can still be non-zero
because the derivative depends on the logprob term.
TensorBoard metrics¶
Pass --metrics-log-dir to control where TensorBoard event files are written.
The default is shown by areno train --help and is also surfaced in the
training config printout.
areno train ... --metrics-log-dir /tmp/areno/tfevent
tensorboard --logdir /tmp/areno/tfevent
The writer lives in areno.api.metrics. It records three namespaces:
rollout/*Sample-side statistics computed from the train batch, including
rollout/rewards_mean,rollout/rewards_std,rollout/rewards_max,rollout/rewards_min,rollout/accuracy,rollout/advantages_mean,rollout/advantages_std,rollout/logprobs_mean,rollout/seq_len_mean,rollout/prompt_len_mean,rollout/response_len_mean,rollout/num_sequences,rollout/skipped_long, androllout/total_skipped_long.train/*Every scalar returned in
train_stats. Typical examples aretrain/loss,train/policy_loss,train/total_loss,train/ratio_mean,train/ratio_std,train/grad_norm,train/lr,train/rollout_logprobs_mean, andtrain/train_logprobs_mean.time/*Stage timings when available:
time/rollout,time/reward,time/advantage, andtime/train.
Agentic diagnostics¶
Agentic rollout adds two diagnostic surfaces.
First, the trainer logs batch-level information before and after agent execution:
agentic rollout batch prompts=2 n_samples=8 expected_requests=16 max_running_prompts=16
agentic train batch built samples=16 tokens=223308 messages=242 tool_calls=133 tool_results=97
These lines are useful for checking whether the configured concurrency, trajectory length, and tool-call volume match expectations.
Second, set ARENO_LOG_COMPLETIONS to a positive integer to log a bounded
number of prompt/sample trajectories. For agentic rollouts this includes the
rendered prompt, message list, final answer, parsed tool calls, sampled tool
results, token row prefix, and loss-mask summary:
ARENO_LOG_COMPLETIONS=2 areno train ...
When a trajectory is dropped for exceeding the model context window,
PolicyOnlyTrainer logs agentic trajectory filtered: ... with token
counts, message counts, assistant turn counts, tool-result counts, and a short
prompt preview. This is the fastest way to debug overlong agentic examples
without dumping every token in every trajectory.