NVIDIA/Model-Optimizer
A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
First Claude commit: Mar 17, 2026Last Claude commit: 3mo agoDiscovered: Mar 18, 2026
Recent Claude Commits
Add Python 3.13 support (#1048)
cb1ff323mo agomessage_footer