Claude
Claude is a model in our research taxonomy.
Related papers
- Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek
- Learning to Inject: Automated Prompt Injection via Reinforcement Learning
- When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents
- Evaluating Semantic and Syntactic Understanding in Large Language Models for Payroll Systems
- From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan's Humanities and Social Sciences
- Dialogical Reasoning Across AI Architectures: A Multi-Model Framework for Testing AI Alignment Strategies
- Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
- ResearchGym: Evaluating Language Model Agents on Real-World AI Research
- General Agent Evaluation
- AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems
- Who's in Charge? Disempowerment Patterns in Real-World LLM Usage
- AviationLMM: A Large Multimodal Foundation Model for Civil Aviation
- ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction
- Who Should Have Surgery? A Comparative Study of GenAI vs Supervised ML for CRS Surgical Outcome Prediction
- How does information access affect LLM monitors' ability to detect sabotage?
- When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems
- ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering
- GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered