AI Model Evaluation

Trending

3papers

5.0viability

+100%30d

Papers

1–2 of 2

Research Paper·Jan 30, 2026

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

Large language models (LLMs) are widely used as reference-free evaluators via prompting, but this "LLM-as-a-Judge" paradigm is costly, opaque, and sensitive to prompt design. In this work, we investig...

5.0 viability

Research Paper·Feb 12, 2026

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction

As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods struggle wit...

5.0 viability

AI Model Evaluation

Papers

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction

Filters