Benchmark

Weekly scoreboard of top papers by Signal Fusion. Data is updated weekly by our pipeline.

Week of 2026-03-02 · v1

RankPaperViabilityComposite
1RoboPocket: Improve Robot Policies Instantly with Your Phone7.070.0
2POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation3.030.0
3The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks2.020.0
4Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation4.040.0
5Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought2.020.0
6Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation5.050.0
7SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis4.040.0
8Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval4.040.0
9Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic Asymmetry4.040.0
10RealWonder: Real-Time Physical Action-Conditioned Video Generation7.070.0
11Residual RL--MPC for Robust Microrobotic Cell Pushing Under Time-Varying Flow5.050.0
12Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model6.060.0
13SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning6.060.0
14Ensembling Language Models with Sequential Monte Carlo3.030.0
15RelaxFlow: Text-Driven Amodal 3D Generation7.070.0
16MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis6.060.0
17Dissociating Direct Access from Inference in AI Introspection2.020.0
18Judge Reliability Harness: Stress Testing the Reliability of LLM Judges5.050.0
19Legal interpretation and AI: from expert systems to argumentation and LLMs2.020.0
20Learning Causal Structure of Time Series using Best Order Score Search3.030.0
21PACE: A Personalized Adaptive Curriculum Engine for 9-1-1 Call-taker Training7.070.0
22Ailed: A Psyche-Driven Chess Engine with Dynamic Emotional Modulation5.050.0
23Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned4.040.0
24GALACTIC: Global and Local Agnostic Counterfactuals for Time-series Clustering6.060.0
25PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration7.070.0
26Latent-Mark: An Audio Watermark Robust to Neural Resynthesis5.050.0
27Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution7.070.0
28UniSTOK: Uniform Inductive Spatio-Temporal Kriging6.060.0
29WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation7.070.0
30WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces5.050.0
31STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks6.060.0
32X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes5.050.0
33Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts4.040.0
34GCAgent: Enhancing Group Chat Communication through Dialogue Agents System4.040.0
35Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning6.060.0
36Recursive Inference Machines for Neural Reasoning5.050.0
37Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards6.060.0
38Not All Trust is the Same: Effects of Decision Workflow and Explanations in Human-AI Decision Making2.020.0
39The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology5.050.0
40AI+HW 2035: Shaping the Next Decade3.030.0
41SPyCer: Semi-Supervised Physics-Guided Contextual Attention for Near-Surface Air Temperature Estimation from Satellite Imagery7.070.0
42KARL: Knowledge Agents via Reinforcement Learning7.070.0
43Early Warning of Intraoperative Adverse Events via Transformer-Driven Multi-Label Learning6.060.0
44Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding3.030.0
45Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation7.070.0
46Logi-PAR: Logic-Infused Patient Activity Recognition via Differentiable Rule7.070.0
47Guidelines for the Annotation and Visualization of Legal Argumentation Structures in Chinese Judicial Decisions3.030.0
48C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning5.050.0
49Lifelong Language-Conditioned Robotic Manipulation Learning4.040.0
50SSR-GS: Separating Specular Reflection in Gaussian Splatting for Glossy Surface Reconstruction7.070.0

Methodology

Rankings use Signal Fusion: (viability_score × 10) + (unicorn_probability × 30) + (votes × 0.1) + (star_velocity × 5). See our FAQ for full scoring definition.