Deploy LLMs on OpenShift
Copy-paste deployment manifests for running open-weight models with vLLM on OpenShift. Pick a model, choose a variant, and oc apply -f.
Latest recipes
Red Hat AI
Llama 3.2 1B Instruct FP8 Dynamic
FP8-quantized 1.5B parameter Llama 3.2 instruction-tuned model. Reduces GPU memory ~50% vs BF16 with minimal accuracy loss.
Nvidia
NVIDIA Nemotron 3 Nano 30B A3B FP8
FP8-quantized 30B hybrid Mamba-2/Transformer MoE reasoning model (3.5B active). Supports togglable chain-of-thought and fits on a single H100 GPU.
Meta
Llama 3.1 8B Instruct
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models...
Red Hat AI
DiffusionGemma 26B A4B IT FP8
FP8-quantized 26B mixture-of-experts diffusion language model (4B active). Fits on a single H100 GPU.
Google
Gemma 4 12B IT
Encoder-free multimodal 12B model supporting text, image, and audio input. Fits on a single GPU.