All Recipes

5 models ready for OpenShift deployment

Google (1)

Google

Encoder-free multimodal 12B model supporting text, image, and audio input. Fits on a single GPU.

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models...

Nvidia

FP8-quantized 30B hybrid Mamba-2/Transformer MoE reasoning model (3.5B active). Supports togglable chain-of-thought and fits on a single H100 GPU.

Red Hat AI

FP8-quantized 1.5B parameter Llama 3.2 instruction-tuned model. Reduces GPU memory ~50% vs BF16 with minimal accuracy loss.

Red Hat AI

FP8-quantized 26B mixture-of-experts diffusion language model (4B active). Fits on a single H100 GPU.