LLM Calibration Comparison Hub

A calibration protocol for LLMs to improve their reliability as automated judges in low-label settings.

A study revealing how LLMs compute verbal confidence, enhancing our understanding of model uncertainty.

Develop a pipeline to efficiently infer calibrated uncertainty estimates in LLMs for high-stakes domains.

Reference Surfaces