3 papers - avg viability 3.0
This research empirically analyzes LLM responses to moral dilemmas to understand if they exhibit genuine moral reasoning or merely mimic it through alignment training, revealing a 'moral ventriloquism' phenomenon.
Develops a method to analyze and steer high-level behavioral traits in LLMs within strategic game environments.
Analyzing the fragility of moral judgments by large language models through a perturbation framework of dilemmas.