Inference Optimization

4papers

4.3viability

Papers

1–4 of 4

Research Paper·Feb 5, 2026·B2BFintech

Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering

Large language models (LLMs) exhibit persistent miscalibration, especially after instruction tuning and preference alignment. Modified training objectives can improve calibration, but retraining is ex...

6.0 viability

Research Paper·Feb 1, 2026

Learning to Guide Local Search for MPE Inference in Probabilistic Graphical Models

Most Probable Explanation (MPE) inference in Probabilistic Graphical Models (PGMs) is a fundamental yet computationally challenging problem arising in domains such as diagnosis, planning, and structur...

5.0 viability

Research Paper·Jan 22, 2026

Why Inference in Large Models Becomes Decomposable After Training

Inference in large-scale AI models is typically performed on dense parameter matrices, leading to inference cost and system complexity that scale unsustainably with model size. This limitation does no...

3.0 viability

Research Paper·Feb 11, 2026

HiFloat4 Format for Language Model Inference

This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits...