Dataset Export Console. JSONL. Use the public dataset as a machine-readable proof surface.
Reference Asset
Open artifact of 1,000 production rows with 12 fields, immutable JSON/CSV/schema receipts, and API parity. Freshness window ended 2026-06-07T04:25:49.511Z. CC BY 4.0.
PREVIEW · 1,000 ROWS
| # | arxiv_id | Title | Score | Cluster | Code | Tags |
|---|---|---|---|---|---|---|
| 1 | 2606.06493v1 | HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complement… Unified robotic control system for complex humanoid tasks using distilled expert networks. | 3 | Robotics and Control Systems | high_potential | |
| 2 | 2606.06492v1 | Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Softwa… Code2LoRA adapts coding language models for software evolution using hypernetwork-generated adapters. | 8 | AI Tools for Software Development | series_a_plushigh_potential | |
| 3 | 2606.06491v1 | TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies TempoVLA enables robots to dynamically adjust execution speed for efficient and precise task performance. | 7 | AI and Robotics | quick_buildhigh_potential | |
| 4 | 2606.06486v1 | Regret Minimization with Adaptive Opponents in Repeated Games Develop a tool for strategic decision optimization in repeated games using adaptive algorithms. | 6 | AI Decision Systems | quick_buildhigh_potential | |
| 5 | 2606.06481v1 | Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi… Develop a benchmark tool for detecting progressive human-AI text transformations to enhance AI-authorship transparency. | 8 | AI Text Detection | quick_buildseries_a_plus+1 |
Showing 5 of 1,000 rows. Full export via API or download.
Use This Via API or MCP
Use the public dataset as a machine-readable proof surface with the same immutable JSON, CSV, schema, and manifest receipts.
| Column | Type | Example | Description |
|---|---|---|---|
arxiv_id | string | 2606.06493v1 | Canonical arXiv identifier; primary key. |
title | string | HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via … | Paper title as published. |
abstract | string | Training modern neural networks… | Original abstract text. |
published_date | string (ISO 8601) | 2026-04-21T17:59:02+00:00 | Original publication date on arXiv. |
viability_score | number | null | 3 | Composite commercial viability rank. |
cluster_label | string | Robotics and Control Systems | Research field assigned during clustering. |
has_code | boolean | false | True when an external repository URL is attached. |
repo_url | string | null | https://github.com/owner/repo | URL of the linked code repository, if any. |
commercial_flags | string[] | ["has_code","high_potential"] | Signal flags such as has_code or high_potential. |
one_liner | string | Unified robotic control system for complex humanoid tasks us… | Short, human-readable summary of the paper. |
time_to_mvp | string | 6+ months | Coarse estimate of time required to ship an MVP. |
tags | string[] | ["high_potential"] | Topic tags applied during enrichment. |
arXiv ingest
daily
Dedupe
near-duplicate authors + abstract
Score
viability composite
Snapshot
immutable artifact
dataset-public-v3dataset_export_v3Licensed under CC BY 4.0. Attribute as:
ScienceToStartup — AI Research Dataset, artifact public-dataset-2026-06-06T04-25-49-511Z. https://sciencetostartup.com/resources/dataset
Agent Handoff
Canonical ID dataset | Route /resources/dataset
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/dataset/datasetMCP example
{
"tool": "search_papers",
"arguments": {
"query": "dataset export"
}
}source_context
{
"surface": "dataset",
"mode": "resource",
"query": "public dataset",
"normalized_query": "dataset",
"route": "/resources/dataset",
"paper_ref": null,
"topic_slug": null,
"benchmark_ref": null,
"dataset_ref": "dataset"
}/api/v1/resources/datasetReturns the artifact manifest: schema, freshness, immutable URLs.
/api/v1/resources/dataset/export?format=jsonStreams every row as JSON. Array fields stay arrays.
/api/v1/resources/dataset/export?format=csvFlat CSV export. Array fields flatten to semicolon-delimited strings.
curl -s https://sciencetostartup.com/api/v1/resources/dataset/export?format=json | jq '.data[0]'{
"data": [
{
"arxiv_id": "2606.06493v1",
"title": "HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers",
"viability_score": 3,
"cluster_label": "Robotics and Control Systems",
"has_code": false,
"one_liner": "Unified robotic control system for complex humanoid tasks using distilled expert networks.",
"tags": [
"high_potential"
]
}
],
"meta": {
"count": 1,
"source_count": 1000,
"artifact_id": "public-dataset:2026-06-06T04-25-49-511Z",
"schema_version": "dataset-public-v3",
"exported_at": "2026-06-06T04:25:49.511Z"
}
}Use This Via API or MCP
Pull the dataset through REST, reference it from llms.txt, or use it as the stable evidence layer behind agent workflows that need paper metadata, scores, and exports.