OneComp：一键生成模型压缩 – 九月资源学习站

Arxiv cs.LG2026-04-01🔗 查看原文

OneComp 是开源压缩框架，自动根据模型与可用硬件检测、规划混合精度并分阶段量化（层级→块级→全局精炼）。将首个量化检查点作为可部署枢纽，确保后续改进在同一模型上累加质量。该资源自适应、硬件感知流水线把前沿量化算法转为可复现的生产级部署，帮助降低内存占用、延迟和硬件成本。

原文内容

arXiv:2603.28845v1 Announce Type: new
Abstract: Deploying foundation models is increasingly constrained by memory footprint, latency, and hardware costs. Post-training compression can mitigate these bottlenecks by reducing the precision of model parameters without significantly degrading performance; however, its practical implementation remains challenging as practitioners navigate a fragmented landscape of quantization algorithms, precision budgets, data-driven calibration strategies, and hardware-dependent execution regimes. We present OneComp, an open-source compression framework that transforms this expert workflow into a reproducible, resource-adaptive pipeline. Given a model identifier and available hardware, OneComp automatically inspects the model, plans mixed-precision assignments, and executes progressive quantization stages, ranging from layer-wise compression to block-wise refinement and global refinement. A key architectural choice is treating the first quantized checkpo