LLM Inference Comparison Hub
4 papers - avg viability 5.8
Top Papers
- ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs(7.0)
ArcLight is a CPU-optimized LLM inference architecture that maximizes throughput on many-core CPUs by minimizing cross-NUMA memory access overhead.
- Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference(6.0)
DRIFT offers an efficient dual-model framework to enhance LLMs' long-context reasoning by decoupling knowledge and inference processes.
- Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt(5.0)
Develop a differentially private and communication efficient LLM inference framework for resource-constrained devices.
- Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference(5.0)
Improve LLM inference accuracy and cost tradeoff by using particle filtering algorithms.