Papers
1–2 of 2Research Paper·Jan 30, 2026
Automating Forecasting Question Generation and Resolution for AI Evaluation
Forecasting future events is highly valuable in decision-making and is a robust measure of general intelligence. As forecasting is probabilistic, developing and evaluating AI forecasters requires gene...
7.0 viability
Research Paper·Jan 14, 2026
What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding
Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks, yet their ability to generalize across varying environments remains a under-e...
5.0 viability