METHODOLOGY · v1

How we score papers

Brier score, calibration plot, reliability diagram, and per-sub-area small multiples — Opus-4.7-graded ground truth with founder spot-check.

Ground-truth disclosure. Opus-4.7-graded ground truth with founder spot-check, not founder-graded.

BRIER SCORE · 30-DAY

Aggregate Brier score — last 30 days

Lower is better. Brier score is mean squared error between the model’s predicted probability and the eventual binary outcome. Our public methodology gate is ≤ 0.30; current value is 0.224 with 95% Wilson CI [0.176, 0.281].

CALIBRATION

Calibration plot

Each dot is one of ten predicted-probability deciles. The dashed grey line is perfect calibration; the solid line is the OLS fit through the deciles (slope 0.82, intercept 0.09).

PER-SUB-AREA · 7 CS.AI AREAS

Per-sub-area Brier — 7 cs.AI sub-areas

Aggregate scores hide regression in any one sub-area. We track each of the seven cs.AI sub-areas separately so a regression in one can’t get diluted by gains elsewhere. The bar marks the 95% Wilson interval; the dot marks the point estimate.

CITATION VELOCITY

CitationVelocity weight

The Top-10 Daily Dashboard composes its rank from a base Signal-Fusion score plus a small per-paper velocity term:

signal_fusion_score = base_signal_fusion + velocity_weight *
  normalized_velocity_30d

Current velocity_weight= 0.30. The value is versioned in config/signal_fusion_weights.yaml and any change is routed through ShadowPromoter (kind=weight_promotion); BrierGate auto-reverts on calibration drop and starts the 7-day cooldown. velocity_weight=0 reproduces the prior P3 ranking byte-for-byte (zero-regression invariant).

AUDIT TRAIL

How to verify

Read the JSON twin at /api/methodology.json — same numbers, agent-readable.
Pull the open-source Brier helpers from apps/web/lib/methodology/brier.ts and re-run the math against your own holdout.
Inspect the recompute cron at apps/api/cron/methodology_recompute.py (06:30 UTC daily, after the 06:00 scorecard sweep).

Aggregate Brier score — last 30 days

Calibration plot

Per-sub-area Brier — 7 cs.AI sub-areas

CitationVelocity weight

How to verify

Reliability diagram