Benchmark Proof Canvas. API-backed export. Use the benchmark as a ranking and proof layer.
Reference Asset
Each Monday we score every new arXiv paper on commercial viability, code maturity, and community signal. This is the week of 2026-06-08.
Use This Via API or MCP
Use the benchmark as a ranking and proof layer in REST, MCP, launch-pack, and workspace flows without changing the underlying weekly receipt.
—
—
Solid commercial fit; worth a closer look this week.
Solid commercial fit; worth a closer look this week.
Quiet paper, loud community.
Quiet paper, loud community.
Quiet paper, loud community.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
On the watchlist this week.
Every Monday we pull every new arXiv AI paper from the prior 7 days, score four signals, and rank the top 50 by their weighted sum. The ranking is immutable: the artifact published on Monday is the artifact forever.
Hand-graded 0–10 score for commercial fit: is there a real wedge, a real buyer, a real moat?
Calibrated probability the paper turns into a billion-dollar outcome — the long-tail bet.
Discussion volume across HN, Reddit and Bluesky in the week of publication.
Weekly delta of GitHub stars on the canonical repo. Captures real-world traction.
Updated weekly. Open source. Receipts on every row.
System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5
Score = Viability + Unicorn odds + Community signal + Code velocity
/api/v1/resources/benchmarkCurrent + historical scoreboard metadata./api/v1/resources/benchmark/export?format=jsonFull snapshot JSON./api/v1/resources/benchmark/export?format=csvFlat CSV of every paper, every week./api/v1/resources/benchmark/export?format=pdfPrint-ready PDF of the latest week.{
"meta": {
"count": 12,
"source": "benchmark_snapshots",
"artifact_id": "live-benchmark:2026-06-08:8ee9cadc4e14af44",
"last_updated_at": "2026-06-08T20:34:15.679Z",
"fresh_until": "2026-06-15T20:34:15.679Z",
"status": "ready",
"reason_code": "surface_ready",
"method_version": "v2",
"coverage_window": "Week of 2026-06-08"
},
"data": [
{
"week_start": "2026-06-08",
"rankings": [
{
"rank": 1,
"arxiv_id": "2606.12392v1",
"title": "System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5",
"viability_score": 8,
"composite": 110.5,
"unicorn_probability": 0.8,
"total_votes": 65,
"star_velocity": 0,
"rank_delta": null
}
]
}
]
}https://sciencetostartup.com/api/v1/resources/benchmark
Use This Via API or MCP
The weekly scoreboard is a stable surface for agents that need ranked papers, comparison logic, and a public proof artifact they can cite.
Agent Handoff
Canonical ID benchmark | Route /resources/benchmark
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/benchmark/benchmarkMCP example
{
"tool": "get_signal_fusion_rankings",
"arguments": {
"limit": 10
}
}source_context
{
"surface": "benchmark",
"mode": "ranking",
"query": "weekly benchmark scoreboard",
"normalized_query": "benchmark",
"route": "/resources/benchmark",
"paper_ref": null,
"topic_slug": null,
"benchmark_ref": "benchmark",
"dataset_ref": null
}Drop the weekly benchmark into any page with a single iframe. Updates automatically every Monday.
<iframe
src="https://sciencetostartup.com/resources/embed/trending?week=2026-06-08"
width="640"
height="480"
loading="lazy"
title="ScienceToStartup Weekly Benchmark"
></iframe>