GPT-4
GPT-4 is a model in our research taxonomy.
Related papers
- VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
- Chinese Labor Law Large Language Model Benchmark
- Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks
- MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents
- When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents
- Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
- Emulating Aggregate Human Choice Behavior and Biases with GPT Conversational Agents
- RIFT: Reordered Instruction Following Testbed To Evaluate Instruction Following in Singular Multistep Prompt Structures
- Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity
- VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding
- HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems
- LLM-Guided Quantified SMT Solving over Uninterpreted Functions
- PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
- Affective Flow Language Model for Emotional Support Conversation
- Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback
- SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding
- Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks
- Position on LLM-Assisted Peer Review: Addressing Reviewer Gap through Mentoring and Feedback
- A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs
- EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection