AI Ethics Comparison Hub
13 papers - avg viability 2.9
Current research in AI ethics is increasingly focused on understanding and mitigating biases in large language models (LLMs) and their implications for moral decision-making. Recent work highlights the need for multidimensional evaluation frameworks, such as Social Harm Analysis via Risk Profiles, which reveal that models with similar average risks can exhibit vastly different worst-case behaviors. This shift towards tail-sensitive risk profiling is crucial for high-stakes applications, where even minor biases can lead to significant harm. Additionally, studies are exploring how contextual influences can alter moral preferences in LLMs, emphasizing the importance of controlled evaluations to understand model behavior under real-world conditions. The field is also addressing the implications of AI on human expertise and agency, advocating for frameworks that promote dignified human-AI interactions. As AI systems become more integrated into decision-making processes, establishing trust through transparent governance and continuous evaluation is essential to ensure ethical deployment and mitigate potential harms.
Top Papers
- Building Interpretable Models for Moral Decision-Making(5.0)
Develop an interpretable model to analyze moral decision-making in AI systems.
- SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models(5.0)
Assess and mitigate social harm in large language models using SHARP's multidimensional risk profiling framework.
- Moral Preferences of LLMs Under Directed Contextual Influence(4.0)
Develop a tool to analyze and visualize how contextual influences affect moral decision-making of language models.
- When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models(3.0)
Develop a Negation Sensitivity Index to improve AI model compliance with negated instructions in high-stakes contexts.
- From Future of Work to Future of Workers: Addressing Asymptomatic AI Harms for Dignified Human-AI Interaction(3.0)
A framework for preserving human expertise in AI-augmented work environments across high-stakes industries.
- Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Hallucinations and Disinformation in the Wild(3.0)
JudgeGPT and RogueGPT offer a framework to analyze human susceptibility to AI-generated disinformation, guiding interventions for enhancing trustworthy web intelligence.
- QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks(2.0)
Analyze and address biases in LLMs related to gender and sexuality.
- AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations(2.0)
Toolkit for evaluating LLMs' ability to align with human values in conversations.
- Beyond Abstract Compliance: Operationalising trust in AI as a moral relationship(2.0)
Explore operationalizing trust in AI through African relational ethics to improve community engagement and equity.
- Trust via Reputation of Conviction(2.0)
A theoretical framework for trust based on conviction and reputation in AI agents.