LLM Efficiency Comparison Hub

11 papers - avg viability 5.8

Recent advancements in large language model (LLM) efficiency are focusing on optimizing computational resources while maintaining high performance in reasoning tasks. Techniques like confidence-guided selection and adaptive model cascades are emerging as effective strategies to balance accuracy and cost, with methods demonstrating reductions in inference costs by over 37% without significant accuracy loss. Innovations such as the Collaborative Memory Transformer and hybrid attention mechanisms are addressing the challenges of long-context processing, achieving linear time complexity and constant memory usage, which are crucial for real-world applications. Additionally, new training frameworks are enabling smaller models to perform competitively with larger counterparts, enhancing their utility in cost-sensitive environments. These developments suggest a clear trend toward more efficient, scalable LLMs that can be deployed in commercial settings, potentially transforming industries reliant on AI-driven decision-making and natural language understanding.

Reference Surfaces

Benchmark Industry Index Database View Dataset Alternatives State Report Topic Page

Top Papers

AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection(8.0)
AdaptEvolve optimizes AI agent efficiency by dynamically selecting the best-suited LLM for each decision point, cutting inference costs by 37.9%.
CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute(8.0)
CoRefine reduces compute costs for LLMs by leveraging confidence-guided self-refinement to achieve competitive accuracy.
CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling(8.0)
CoMeT enables efficient long-context processing in existing Transformers with constant memory usage.
Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning(7.0)
COREA optimizes cost-efficient reasoning by cascading small and large language models using a confidence-calibrated system.
MAR: Efficient Large Language Models via Module-aware Architecture Refinement(7.0)
Build efficient and practical Large Language Models using Module-aware Architecture Refinement to lower energy costs without sacrificing performance.
Learning Generative Selection for Best-of-N(7.0)
Develop scalable generative selection capabilities for small models using reinforcement learning to improve reasoning tasks.
Residual Context Diffusion Language Models(6.0)
Residual Context Diffusion allows converting standard diffusion LLMs to more efficient models with improved accuracy by using discarded token contexts.
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling(5.0)
Transform long-context modeling with MiniCPM-SALA, a cost-efficient hybrid attention framework reducing memory and computational demands.
Do LLMs Benefit From Their Own Words?(3.0)
Exploring context-filtering in multi-turn LLM interactions to reduce memory consumption and improve response quality.
RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs(3.0)
RaBiT offers a new quantization framework for LLMs, emphasizing residual-aware training without hardware-intensive setups.