LLM Compression

Trending

3papers

4.3viability

+100%30d

Papers

1–3 of 3

Research Paper·Mar 8, 2026

Shorter Thoughts, Same Answers: Difficulty-Scaled Segment-Wise RL for CoT Compression

Chain-of-thought (CoT) improves reasoning reliability but increases token cost, motivating post-training compression of explicit reasoning traces. However, the shortest sufficient reasoning is not uni...

7.0 viability

Research Paper·Mar 11, 2026

Leech Lattice Vector Quantization for Efficient LLM Compression

Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters joi...

3.0 viability

Research Paper·Mar 18, 2026

Only relative ranks matter in weight-clustered large language models

Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger ...