GPT-4

GPT-4 is a model in our research taxonomy.

Related papers

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
Chinese Labor Law Large Language Model Benchmark
Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks
MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents
When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents
Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
Emulating Aggregate Human Choice Behavior and Biases with GPT Conversational Agents
RIFT: Reordered Instruction Following Testbed To Evaluate Instruction Following in Singular Multistep Prompt Structures
Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity
VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding
HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems
LLM-Guided Quantified SMT Solving over Uninterpreted Functions
PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
Affective Flow Language Model for Emotional Support Conversation
Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback
SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding
Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks
Position on LLM-Assisted Peer Review: Addressing Reviewer Gap through Mentoring and Feedback
A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs
EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection