PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

Estimated $9K - $13K over 6-10 weeks.

See exactly what it costs to build this -- with 3 comparable funded startups.

7-day free trial. Cancel anytime.

Discover the researchers behind this paper and find similar experts.

7-day free trial. Cancel anytime.

References (29)

[1]
A Survey on Large Language Models for Mathematical Reasoning
2025Pengyuan Wang, Tian-Shuo Liu et al.
[2]
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
2025Bowen Jiang, Zhuoqun Hao et al.
[3]
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
2025Siyan Zhao, Mingyi Hong et al.
[4]
Personalization of Large Language Models: A Survey
2024Zhehao Zhang, Ryan A. Rossi et al.
[5]
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
2024Di Wu, Hongwei Wang et al.
[6]
Needle in the Haystack for Memory Based Large Language Models
2024Elliot Nelson, Georgios Kollias et al.
[7]
Scaling Synthetic Data Creation with 1,000,000,000 Personas
2024Xin Chan, Xiaoyang Wang et al.
[8]
LongLaMP: A Benchmark for Personalized Long-form Text Generation
2024Ishita Kumar, Snigdha Viswanathan et al.
[9]
Evaluating Very Long-Term Conversational Memory of LLM Agents
2024Adyasha Maharana, Dong-Ho Lee et al.
[10]
Personalized Language Modeling from Personalized Human Feedback
2024Xinyu Li, Z. Lipton et al.
[11]
Retrieval-Augmented Generation for Large Language Models: A Survey
2023Yunfan Gao, Yun Xiong et al.
[12]
User Modeling in the Era of Large Language Models: Current Research and Future Directions
2023Zhaoxuan Tan, Meng Jiang
[13]
Instruction-Following Evaluation for Large Language Models
2023Jeffrey Zhou, Tianjian Lu et al.
[14]
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
2023Joel Jang, Seungone Kim et al.
[15]
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
2023Yushi Bai, Xin Lv et al.
[16]
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
2023Hanjia Lyu, Song Jiang et al.
[17]
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
2023Yann Dubois, Xuechen Li et al.
[18]
LaMP: When Large Language Models Meet Personalization
2023Alireza Salemi, Sheshera Mysore et al.
[19]
Large Language Model Instruction Following: A Survey of Progresses and Challenges
2023Renze Lou, Kai Zhang et al.
[20]
GPT-4 Technical Report
2023OpenAI Josh Achiam, Steven Adler et al.

Showing 20 of 29 references

Founder's Pitch

"Develop RealPref benchmark for evaluating LLMs in personalized preference-following tasks."

Benchmark DevelopmentScore: 5View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

1/4 signals

2.5

Quick Build

2/4 signals

5

Series A Potential

0/4 signals

0

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/4/2026

Explore the full citation network and related research.

7-day free trial. Cancel anytime.

Understand the commercial significance and market impact.

7-day free trial. Cancel anytime.

Get detailed profiles of the research team.

7-day free trial. Cancel anytime.