State of LLM Inference

4 papers · avg viability 5.8

View topic page

Top papers

ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs(7.0)
Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference(6.0)
Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt(5.0)
Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference(5.0)