Group Relative Policy Optimization
Group Relative Policy Optimization is a research_field in our research taxonomy.
Related papers
- The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
- OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution
- V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation
- Beyond Model Scaling: Test-Time Intervention for Efficient Deep Reasoning
- SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
- On the Hidden Objective Biases of Group-based Reinforcement Learning
- GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization