LoRA simulation

The simulator tracks LoRA adapters loaded by the frontend through the engine utility path (add_lora / remove_lora) and uses that registry while scheduling requests.

--max-loras mirrors vLLM's running-batch diversity cap: it limits how many distinct adapters may be resident in the running batch at once. 0 disables the cap, but LoRA accounting still runs so frontend metrics can report running and waiting adapters.

In the container image, set MOCK_MAX_LORAS; the entrypoint maps it to vllm-vcr play --max-loras.

The vllm:lora_requests_info metric is frontend-derived on current supported vLLM lines. The simulator's job is to keep scheduler stats and adapter state consistent enough for that frontend metric to reflect the request mix.

Keyboard shortcuts

vllm-vcr

LoRA simulation