Top papers
- ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs(7.0)
- Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference(6.0)
- Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt(5.0)
- Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference(5.0)