Rank	Paper	Score	Move
01	System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5 Solid commercial fit; worth a closer look this week.	110.5	—
02	FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning Solid commercial fit; worth a closer look this week.	102.4	—
03	Redesign Mixture-of-Experts Routers with Manifold Power Iteration Quiet paper, loud community.	87.6	—
04	Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models Quiet paper, loud community.	58.6	—
05	DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners? Quiet paper, loud community.	55.1	—
06	TAHOE: Text-to-SQL with Automated Hint Optimization from Experience On the watchlist this week.	0.0	—
07	ATLAS: Active Theory Learning for Automated Science On the watchlist this week.	0.0	—
08	APPO: Agentic Procedural Policy Optimization On the watchlist this week.	0.0	—
09	SPEA2$^+$: Improved Density Estimation in SPEA2 with Provable Runtime Guarantees On the watchlist this week.	0.0	—
10	Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots On the watchlist this week.	0.0	—
11	Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics On the watchlist this week.	0.0	—
12	Latent World Recovery for Multimodal Learning with Missing Modalities On the watchlist this week.	0.0	—
13	CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy On the watchlist this week.	0.0	—
14	Nonslop: A Gamified Experiment in Human-AI Collaborative Writing On the watchlist this week.	0.0	—
15	Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy On the watchlist this week.	0.0	—
16	ALIGNBEAM : Inference-Time Alignment Transfer via Cross-Vocabulary Logit Mixing On the watchlist this week.	0.0	—
17	PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents On the watchlist this week.	0.0	—
18	A Five-Plane Reference Architecture for Runtime Governance of Production AI Agents On the watchlist this week.	0.0	—
19	Harness In-Context Operator Learning with Chain of Operators On the watchlist this week.	0.0	—
20	Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition On the watchlist this week.	0.0	—
21	The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics On the watchlist this week.	0.0	—
22	SpikeDecoder: Realizing the GPT Architecture with Spiking Neural Networks On the watchlist this week.	0.0	—
23	CCKS: Consensus-based Communication and Knowledge Sharing On the watchlist this week.	0.0	—
24	Mathematical perspective on genetic algorithms with optimization guided operators On the watchlist this week.	0.0	—
25	The Impossibility of Eliciting Latent Knowledge On the watchlist this week.	0.0	—
26	Market Design for AI: Beyond the Copyright Binary On the watchlist this week.	0.0	—
27	Using Explainability as a Training-Time Reliability Signal for Efficient ECG Classification On the watchlist this week.	0.0	—
28	Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization On the watchlist this week.	0.0	—
29	DiffCold: A Diffusion-based Generative Model for Cold-Start Item Recommendation On the watchlist this week.	0.0	—
30	VIA-SD: Verification via Intra-Model Routing for Speculative Decoding On the watchlist this week.	0.0	—
31	Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training On the watchlist this week.	0.0	—
32	Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study On the watchlist this week.	0.0	—
33	Adapting Prithvi-EO for Fallow Detection for Food-Water Nexus: ViT-Adapter Necks and Parameter-Efficient Backbone tuning of Geospatial Foundation Model On the watchlist this week.	0.0	—
34	Making Foresight Actionable: Repurposing Representation Alignment in World Action Models On the watchlist this week.	0.0	—
35	Intelligent Automation for Embodied Benchmark Construction: Pipelines, Embodiments, Simulators, and Trends On the watchlist this week.	0.0	—
36	Implicit Neural Representations of Individual Behavior On the watchlist this week.	0.0	—
37	Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application On the watchlist this week.	0.0	—
38	OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models On the watchlist this week.	0.0	—
39	Towards Responsibly Non-Compliant Machines On the watchlist this week.	0.0	—
40	nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding On the watchlist this week.	0.0	—
41	Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders On the watchlist this week.	0.0	—
42	Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation On the watchlist this week.	0.0	—
43	Augmenting Molecular Language Models with Local $n$-gram Memory On the watchlist this week.	0.0	—
44	Bridging the Morphology Gap: Adapting VLA Models to Dexterous Manipulation via Intent-Conditioned Fine-Tuning On the watchlist this week.	0.0	—
45	MSUE: Multi-Modal Soccer Understanding Expert On the watchlist this week.	0.0	—
46	IntElicit: Eliciting and Assessing Contextualized Creativity via Dialogue Policy Optimization On the watchlist this week.	0.0	—
47	Non-frontal face recognition using GANs and memristor-based classifiers On the watchlist this week.	0.0	—
48	"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments On the watchlist this week.	0.0	—
49	On the Limits of LLM-as-Judge for Scientific Novelty Assessment On the watchlist this week.	0.0	—
50	Automating Geometry-Intensive Compliance Checking in BIM: Graph-Based Semantic Reasoning Framework On the watchlist this week.	0.0	—

8-week rank trajectory · top 8

Who held #1 across all weeks

Paper	Weeks at #1	Weeks in top 3	Best rank
System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5	1	1	#1
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution	1	1	#1
Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation	1	1	#1
MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset	1	1	#1
MeMo: Memory as a Model	1	1	#1
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models	1	1	#1

Download

Download PDF JSON CSV

Provenance

Coverage windowWeek of 2026-06-08

Method versionv2

Fresh until2026-06-15T20:34:15.679Z

Artifact IDlive-benchmark:2026-06-08:8ee9cadc4e14af44

Receipthttps://sciencetostartup.com/api/v1/resources/benchmark?artifact_id=live-benchmark%3A2026-06-08%3A8ee9cadc4e14af44

SHA-256 (json)ae124e0e6d332bd6…

API

GET/api/v1/resources/benchmarkapplication/jsonCurrent + historical scoreboard metadata.
GET/api/v1/resources/benchmark/export?format=jsonapplication/jsonFull snapshot JSON.
GET/api/v1/resources/benchmark/export?format=csvtext/csvFlat CSV of every paper, every week.
GET/api/v1/resources/benchmark/export?format=pdfapplication/pdfPrint-ready PDF of the latest week.

Preview response

{
  "meta": {
    "count": 12,
    "source": "benchmark_snapshots",
    "artifact_id": "live-benchmark:2026-06-08:8ee9cadc4e14af44",
    "last_updated_at": "2026-06-08T20:34:15.679Z",
    "fresh_until": "2026-06-15T20:34:15.679Z",
    "status": "ready",
    "reason_code": "surface_ready",
    "method_version": "v2",
    "coverage_window": "Week of 2026-06-08"
  },
  "data": [
    {
      "week_start": "2026-06-08",
      "rankings": [
        {
          "rank": 1,
          "arxiv_id": "2606.12392v1",
          "title": "System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5",
          "viability_score": 8,
          "composite": 110.5,
          "unicorn_probability": 0.8,
          "total_votes": 65,
          "star_velocity": 0,
          "rank_delta": null
        }
      ]
    }
  ]
}

https://sciencetostartup.com/api/v1/resources/benchmark

Use This Via API or MCP

Use the benchmark as a ranking and proof layer

The weekly scoreboard is a stable surface for agents that need ranked papers, comparison logic, and a public proof artifact they can cite.

REST Guide MCP Guide

Handoff

Agent Handoff

Weekly Benchmark Scoreboard

Canonical ID benchmark | Route /resources/benchmark

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/benchmark/benchmark

MCP example

{
  "tool": "get_signal_fusion_rankings",
  "arguments": {
    "limit": 10
  }
}

source_context

{
  "surface": "benchmark",
  "mode": "ranking",
  "query": "weekly benchmark scoreboard",
  "normalized_query": "benchmark",
  "route": "/resources/benchmark",
  "paper_ref": null,
  "topic_slug": null,
  "benchmark_ref": "benchmark",
  "dataset_ref": null
}

Drop the weekly benchmark into any page with a single iframe. Updates automatically every Monday.

<iframe
  src="https://sciencetostartup.com/resources/embed/trending?week=2026-06-08"
  width="640"
  height="480"
  loading="lazy"
  title="ScienceToStartup Weekly Benchmark"
></iframe>

Preview ↗

Weekly Benchmark

Use the benchmark via API or MCP

What changed this week

Climbing

Cooling

New this week

Off the chart

Scoreboard · 50 papers

8-week rank trajectory · top 8

Who held #1 across all weeks

How we score

This week’s #1 — broken down

Download

Provenance

API

Use the benchmark as a ranking and proof layer

Handoff

Weekly Benchmark Scoreboard

Weekly Benchmark

Use the benchmark via API or MCP

What changed this week

Climbing

Cooling

New this week

Off the chart

Scoreboard · 50 papers

8-week rank trajectory · top 8

Who held #1 across all weeks

How we score

This week’s #1 — broken down

Download

Provenance

API

Use the benchmark as a ranking and proof layer

Handoff

Weekly Benchmark Scoreboard