Model Compression Comparison Hub
7 papers - avg viability 5.3
Recent advancements in model compression are focusing on enhancing the efficiency of large language models (LLMs) while maintaining their performance. Techniques such as adaptive pruning and family-aware quantization are gaining traction, allowing for significant reductions in computational costs without extensive retraining. Adaptive pruning leverages agent-guided methods to intelligently select which layers to prune, improving factual knowledge retention and overall accuracy. Meanwhile, family-aware quantization addresses the limitations of traditional calibration data by generating high-fidelity samples from related models, resulting in reduced accuracy loss during deployment. Additionally, new frameworks like Hessian Robust Quantization are reshaping the loss landscape to enhance robustness against quantization noise, while quantization-aware unlearning methods are being developed to effectively manage the removal of sensitive information. These innovations are not only improving the practicality of deploying LLMs on resource-constrained devices but also addressing critical challenges in knowledge retention and data privacy, positioning the field for broader commercial applications.
Top Papers
- LLMs can Compress LLMs: Adaptive Pruning by Agents(8.0)
AI agent-guided pruning enhances LLM compression without retraining, reducing costs while retaining performance.
- FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization(7.0)
Develop a tool for enhancing model quantization accuracy by regenerating calibration data using Family-Aware Quantization.
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning(5.0)
HeRo-Q offers a robust, easy-to-integrate algorithm that enhances the stability of low bit quantization in large language models.
- QUAIL: Quantization Aware Unlearning for Mitigating Misinformation in LLMs(5.0)
A quantization-aware unlearning tool to enhance misinformation mitigation in low-bit quantized machine learning models.
- Sparsity Induction for Accurate Post-Training Pruning of Large Language Models(5.0)
Build a tool for enhancing sparsity in large language models to improve post-training pruning performance.
- Elimination-compensation pruning for fully-connected neural networks(5.0)
Introduce a novel pruning method for neural networks that compensates for weight removal by adjusting adjacent biases, enhancing model efficiency.
- Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression(2.0)
Develop an efficient sub-bit model compression technique to minimize storage costs without significant loss of accuracy.