Projects/NVIDIA/Model-Optimizer

NVIDIA/Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

View on GitHub
3.0kStars
455Forks
1Claude Commits
PythonLanguage
Website
First Claude commit: Mar 17, 2026Last Claude commit: 3mo agoDiscovered: Mar 18, 2026

Recent Claude Commits

Add Python 3.13 support (#1048)
cb1ff323mo agomessage_footer