All Recipes
5 models ready for OpenShift deployment
Google (1)
Meta (1)
Nvidia (1)
Red Hat AI (2)
Red Hat AI
Llama 3.2 1B Instruct FP8 Dynamic
FP8-quantized 1.5B parameter Llama 3.2 instruction-tuned model. Reduces GPU memory ~50% vs BF16 with minimal accuracy loss.
textfp81 GPU
Red Hat AI
DiffusionGemma 26B A4B IT FP8
FP8-quantized 26B mixture-of-experts diffusion language model (4B active). Fits on a single H100 GPU.
textfp81 GPU