Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Replay pacing

Content replay (--replay-tokens) and timing are independent. Pick the content mode first, then choose how quickly and in what shape the engine should emit chunks.

ModeInvocation
Timing-modeled--replay-tokens trace.gz --latency-trace trace.gz plus scheduler args matching the capture (--max-num-seqs, --max-num-batched-tokens, ...): gaps and burst sizes sampled from a model fitted to the trace
Timing-verbatim--replay-tokens trace.gz --replay-steps trace.gz: each request replays its recorded per-chunk sizes and gaps
As fast as possible--replay-tokens trace.gz and nothing else: all timing knobs default to 0, the instant model
Compressed but shaped--replay-tokens trace.gz --latency-trace trace.gz --time-scale 100: same interleavings and relative ordering, 100x faster wall clock
Synthetic timing--replay-tokens trace.gz --time-to-first-token 50 --inter-token-latency 10

For the fast path, scheduler limits still apply at zero delay. --max-num-seqs and the token budget control queueing and backpressure; increase them for pass-through replay. --output-token-chunk-size controls output framing.