OneComp:一行式生成模型压缩

Arxiv cs.LG2026-04-01🔗 查看原文
OneComp 是开源的后训练压缩框架,自动检测模型与硬件,规划混合精度并执行渐进量化(层级、块级、全局),首个量化检查点可部署为枢纽,随算力递增持续提升模型质量,旨在把量化研究转为可复现、资源自适应的生产级流程。
原文内容
arXiv:2603.28845v1 Announce Type: new
Abstract: Deploying foundation models is increasingly constrained by memory footprint, latency, and hardware costs. Post-training compression can mitigate these bottlenecks by reducing the precision of model parameters without significantly degrading performance; however, its practical implementation remains challenging as practitioners navigate a fragmented landscape of quantization algorithms, precision budgets, data-driven calibration strategies, and hardware-dependent execution regimes. We present OneComp, an open-source compression framework that transforms this expert workflow into a reproducible, resource-adaptive pipeline. Given a model identifier and available hardware, OneComp automatically inspects the model, plans mixed-precision assignments, and executes progressive quantization stages, ranging from layer-wise compression to block-wise refinement and global refinement. A key architectural choice is treating the first quantized checkpoint as a deployable pivot, ensuring that each subsequent stage improves the same model and that quality increases as more compute is invested. By converting state-of-the-art compression research into an extensible, open-source, hardware-aware pipeline, OneComp bridges the gap between algorithmic innovation and production-grade model deployment.