AI Evaluation Tools Comparison Hub
3 papers - avg viability 6.3
Top Papers
- Automating Forecasting Question Generation and Resolution for AI Evaluation(7.0)
Automated AI system for scalable forecasting question generation and resolution with high accuracy.
- What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding(5.0)
Develop a benchmarking tool to assess LLM agents' environment understanding using Task-to-Quiz paradigm.