Projects/NVIDIA/Model-Optimizer

NVIDIA/Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

View on GitHub

3.0kStars

455Forks

1Claude Commits

PythonLanguage

Website

First Claude commit: Mar 17, 2026Last Claude commit: 3mo agoDiscovered: Mar 18, 2026

Recent Claude Commits

Add Python 3.13 support (#1048)

cb1ff323mo agomessage_footer