LLM Behavior Analysis Comparison Hub

3 papers - avg viability 3.0

Reference Surfaces

Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models(3.0)
This research empirically analyzes LLM responses to moral dilemmas to understand if they exhibit genuine moral reasoning or merely mimic it through alignment training, revealing a 'moral ventriloquism' phenomenon.
Persona Vectors in Games: Measuring and Steering Strategies via Activation Vectors(3.0)
Develops a method to analyze and steer high-level behavioral traits in LLMs within strategic game environments.