Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback

PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

Estimated $9K - $13K over 6-10 weeks.

See exactly what it costs to build this -- with 3 comparable funded startups.

7-day free trial. Cancel anytime.

Discover the researchers behind this paper and find similar experts.

7-day free trial. Cancel anytime.

References (35)

[1]
When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
2025Yifan Xu, Xichen Ye et al.
[2]
Similarity as Reward Alignment: Robust and Versatile Preference-based Reinforcement Learning
2025Sara Rajaram, R. J. Cotton et al.
[3]
TREND: Tri-Teaching for Robust Preference-based Reinforcement Learning with Demonstrations
2025Shuaiyi Huang, Mara Levy et al.
[4]
Adaptive Confidence-aware Preference-based Reinforcement Learning with Noisy Feedback
2025Yuhao Gong, Zhenbo Lu et al.
[5]
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
2025Jia-Nan Li, Jian Guan et al.
[6]
Strategyproof Reinforcement Learning from Human Feedback
2025Thomas Kleine Buening, Jiarui Gan et al.
[7]
Robust Reward Alignment via Hypothesis Space Batch Cutting
2025Zhixian Xie, Haode Zhang et al.
[8]
Human Feedback Attack on Online RLHF: Attack and Robust Defense
2025Chenye Yang, Mo Lyu et al.
[9]
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
2024Vishakh Padmakumar, Chuanyang Jin et al.
[10]
Mixing corrupted preferences for robust and feedback-efficient preference-based reinforcement learning
2024Jongkook Heo, Young Jae Lee et al.
[11]
Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives
2024Hao Sun, Yunyi Shen et al.
[12]
Diverging Preferences: When do Annotators Disagree and do Models Know?
2024Michael J.Q. Zhang, Zhilin Wang et al.
[13]
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
2024S. Poddar, Yanming Wan et al.
[14]
Robust Reinforcement Learning from Corrupted Human Feedback
2024Alexander Bukharin, Ilgee Hong et al.
[15]
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
2024Jie Cheng, Gang Xiong et al.
[16]
Data Poisoning Attack Against Reinforcement Learning from Human Feedback in Robot Control Tasks
2024Zihui Zhou, Yutong Gao et al.
[17]
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
2023Jiong Wang, Junlin Wu et al.
[18]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023Rafael Rafailov, Archit Sharma et al.
[19]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
2022Yuntao Bai, Andy Jones et al.
[20]
Training language models to follow instructions with human feedback
2022Long Ouyang, Jeff Wu et al.

Showing 20 of 35 references

Founder's Pitch

"Robustify preference-based reinforcement learning against unreliable annotators through trust parameter optimization."

Reinforcement LearningScore: 2View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

1/4 signals

2.5

Quick Build

1/4 signals

2.5

Series A Potential

1/4 signals

2.5

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 1/26/2026

Explore the full citation network and related research.

7-day free trial. Cancel anytime.

Understand the commercial significance and market impact.

7-day free trial. Cancel anytime.

Get detailed profiles of the research team.

7-day free trial. Cancel anytime.

Related Papers

Loading…