LLM Inference

4papers

5.8viability

Papers

1–4 of 4

Research Paper·Mar 8, 2026

ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs

Although existing frameworks for large language model (LLM) inference on CPUs are mature, they fail to fully exploit the computation potential of many-core CPU platforms. Many-core CPUs are widely dep...

7.0 viability

Research Paper·Feb 10, 2026

Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference

The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solu...

6.0 viability

Research Paper·Feb 12, 2026

Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt

Large Language Models (LLMs) have achieved remarkable performance and received significant research interest. The enormous computational demands, however, hinder the local deployment on devices with l...

5.0 viability

Research Paper·Mar 9, 2026

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost...