Papers
1–3 of 3Research Paper·Mar 8, 2026
Shorter Thoughts, Same Answers: Difficulty-Scaled Segment-Wise RL for CoT Compression
Chain-of-thought (CoT) improves reasoning reliability but increases token cost, motivating post-training compression of explicit reasoning traces. However, the shortest sufficient reasoning is not uni...
7.0 viability
Research Paper·Mar 11, 2026
Leech Lattice Vector Quantization for Efficient LLM Compression
Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters joi...
3.0 viability
Research Paper·Mar 18, 2026
Only relative ranks matter in weight-clustered large language models
Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger ...
3.0 viability