Claude

Claude is a model in our research taxonomy.

Related papers

Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek
Learning to Inject: Automated Prompt Injection via Reinforcement Learning
When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents
Evaluating Semantic and Syntactic Understanding in Large Language Models for Payroll Systems
From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan's Humanities and Social Sciences
Dialogical Reasoning Across AI Architectures: A Multi-Model Framework for Testing AI Alignment Strategies
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
ResearchGym: Evaluating Language Model Agents on Real-World AI Research
General Agent Evaluation
AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems
Who's in Charge? Disempowerment Patterns in Real-World LLM Usage
AviationLMM: A Large Multimodal Foundation Model for Civil Aviation
ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction
Who Should Have Surgery? A Comparative Study of GenAI vs Supervised ML for CRS Surgical Outcome Prediction
How does information access affect LLM monitors' ability to detect sabotage?
When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems
ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering
GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered