Research Field
Training agents via reward. Used in games, robotics, and LLM alignment (RLHF).
No reviews yet.