PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

Estimated $9K - $13K over 6-10 weeks.

See exactly what it costs to build this -- with 3 comparable funded startups.

7-day free trial. Cancel anytime.

Discover the researchers behind this paper and find similar experts.

7-day free trial. Cancel anytime.

References (24)

[1]
Recommender Systems with Generative Retrieval
2023Shashank Rajput, Nikhil Mehta et al.
[2]
Behavior Proximal Policy Optimization
2023Zifeng Zhuang, Kun Lei et al.
[3]
Efficient Online Reinforcement Learning with Offline Data
2023Philip J. Ball, Laura M. Smith et al.
[4]
Policy Expansion for Bridging Offline-to-Online Reinforcement Learning
2023Haichao Zhang, Weiwen Xu et al.
[5]
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
2022Shijie Geng, Shuchang Liu et al.
[6]
Offline Reinforcement Learning with Implicit Q-Learning
2021Ilya Kostrikov, Ashvin Nair et al.
[7]
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
2021Gaon An, Seungyong Moon et al.
[8]
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
2021Seunghyun Lee, Younggyo Seo et al.
[9]
Critic Regularized Regression
2020Ziyun Wang, Alexander Novikov et al.
[10]
Accelerating Online Reinforcement Learning with Offline Datasets
2020Ashvin Nair, Murtaza Dalal et al.
[11]
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
2019Xue Bin Peng, Aviral Kumar et al.
[12]
Distributionally Robust Optimization: A Review
2019Hamed Rahimian, Sanjay Mehrotra
[13]
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
2019Aviral Kumar, Justin Fu et al.
[14]
Off-Policy Deep Reinforcement Learning without Exploration
2018Scott Fujimoto, D. Meger et al.
[15]
Top-K Off-Policy Correction for a REINFORCE Recommender System
2018Minmin Chen, Alex Beutel et al.
[16]
Proximal Policy Optimization Algorithms
2017John Schulman, Filip Wolski et al.
[17]
Billion-Scale Similarity Search with GPUs
2017Jeff Johnson, Matthijs Douze et al.
[18]
Deep Reinforcement Learning in Large Discrete Action Spaces
2015Gabriel Dulac-Arnold, Richard Evans et al.
[19]
Relative Entropy Policy Search
2010Jan Peters, Katharina Muelling et al.
[20]
Curriculum learning
2009Yoshua Bengio, J. Louradour et al.

Showing 20 of 24 references

Founder's Pitch

"Optimistic DRPO offers a robust approach to enhance sequential user interactions by overcoming data quality issues in policy-based RL systems."

Recommender SystemsScore: 6View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

2/4 signals

5

Quick Build

4/4 signals

10

Series A Potential

2/4 signals

5

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 2/11/2026

Explore the full citation network and related research.

7-day free trial. Cancel anytime.

Understand the commercial significance and market impact.

7-day free trial. Cancel anytime.

Get detailed profiles of the research team.

7-day free trial. Cancel anytime.