Transformer Optimization

Trending

4papers

4.5viability

+200%30d

Papers

1–4 of 4

Research Paper·Jan 29, 2026

FBS: Modeling Native Parallel Reading inside a Transformer

Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss cor...

6.0 viability

Research Paper·Mar 4, 2026

Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs

Post-training quantization (PTQ) of transformers is known to suffer from severe accuracy degradation due to structured activation outliers, as originally analyzed by Bondarenko et al. (EMNLP 2021) in ...

6.0 viability

Research Paper·Feb 9, 2026

QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill

We present QUOKA: Query-oriented KV selection for efficient attention, a training-free and hardware agnostic sparse attention algorithm for accelerating transformer inference under chunked prefill. Wh...

3.0 viability