LLM Efficiency Comparison Hub
11 papers - avg viability 5.8
Recent advancements in large language model (LLM) efficiency are focusing on optimizing computational resources while maintaining high performance in reasoning tasks. Techniques like confidence-guided selection and adaptive model cascades are emerging as effective strategies to balance accuracy and cost, with methods demonstrating reductions in inference costs by over 37% without significant accuracy loss. Innovations such as the Collaborative Memory Transformer and hybrid attention mechanisms are addressing the challenges of long-context processing, achieving linear time complexity and constant memory usage, which are crucial for real-world applications. Additionally, new training frameworks are enabling smaller models to perform competitively with larger counterparts, enhancing their utility in cost-sensitive environments. These developments suggest a clear trend toward more efficient, scalable LLMs that can be deployed in commercial settings, potentially transforming industries reliant on AI-driven decision-making and natural language understanding.
Top Papers
- AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection(8.0)
AdaptEvolve optimizes AI agent efficiency by dynamically selecting the best-suited LLM for each decision point, cutting inference costs by 37.9%.
- CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute(8.0)
CoRefine reduces compute costs for LLMs by leveraging confidence-guided self-refinement to achieve competitive accuracy.
- CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling(8.0)
CoMeT enables efficient long-context processing in existing Transformers with constant memory usage.
- Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning(7.0)
COREA optimizes cost-efficient reasoning by cascading small and large language models using a confidence-calibrated system.
- MAR: Efficient Large Language Models via Module-aware Architecture Refinement(7.0)
Build efficient and practical Large Language Models using Module-aware Architecture Refinement to lower energy costs without sacrificing performance.
- Learning Generative Selection for Best-of-N(7.0)
Develop scalable generative selection capabilities for small models using reinforcement learning to improve reasoning tasks.
- Residual Context Diffusion Language Models(6.0)
Residual Context Diffusion allows converting standard diffusion LLMs to more efficient models with improved accuracy by using discarded token contexts.
- MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling(5.0)
Transform long-context modeling with MiniCPM-SALA, a cost-efficient hybrid attention framework reducing memory and computational demands.
- Do LLMs Benefit From Their Own Words?(3.0)
Exploring context-filtering in multi-turn LLM interactions to reduce memory consumption and improve response quality.
- RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs(3.0)
RaBiT offers a new quantization framework for LLMs, emphasizing residual-aware training without hardware-intensive setups.