Responsible AI Paper Summaries

Stay updated with the latest in Responsible AI. Subscribe to the Responsible AI newsletter for weekly updates on new papers and more.

Overview

Welcome to the Responsible AI Paper Summaries repository! Here, you'll find concise summaries of key papers in various areas of responsible AI.

This repository provides brief summaries of AI/ML papers in the following areas:

Explainability and Interpretability
Fairness and Biases
Privacy
Security
Safety
Accountability
Human Control and Interaction
Legal and Ethical Guidelines

Recent Summaries

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions - arXiv, 2024. This paper from OpenAI introduces an instruction hierarchy to train LLMs to prioritize privileged instructions(system messages) over lower-level ones (user messages and third-party inputs), enhancing their robustness against adversaries.
Taxonomy of Risks Posed by Language Models - FAccT ’22. This paper develops a comprehensive taxonomy of ethical and social risks associated with large-scale language models (LMs). It identifies twenty-one risks and categorizes them into six risk areas to guide responsible innovation and mitigation strategies.
Why Should I Trust You? Explaining the Predictions of Any Classifier - KDD 2016. This paper introduces LIME (Local Interpretable Model-agnostic Explanations), a technique to explain the predictions of any classifier in a faithful and interpretable manner by learning an interpretable model locally around the prediction.

Summaries by Topic

Explainability and Interpretability

Why Should I Trust You? Explaining the Predictions of Any Classifier - KDD 2016. This paper introduces LIME (Local Interpretable Model-agnostic Explanations), a technique to explain the predictions of any classifier in a faithful and interpretable manner by learning an interpretable model locally around the prediction.
A Nutritional Label for Rankings - SIGMOD’18. Provides a web-based application called Ranking Facts that generates a "nutritional label" for rankings to enhance transparency, fairness, and stability.
A Unified Approach to Interpreting Model Predictions - NIPS 2017. Introduces SHAP (SHapley Additive exPlanations), a unified framework for interpreting model predictions by assigning each feature an importance value for a particular prediction, integrating six existing methods into a single, cohesive approach.

Fairness and Biases

Taxonomy of Risks Posed by Language Models - FAccT ’22. This paper develops a comprehensive taxonomy of ethical and social risks associated with large-scale language models (LMs). It identifies twenty-one risks and categorizes them into six risk areas to guide responsible innovation and mitigation strategies.

Privacy

Security

Safety

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions - arXiv, 2024. This paper from OpenAI introduces an instruction hierarchy to train LLMs to prioritize privileged instructions(system messages) over lower-level ones (user messages and third-party inputs), enhancing their robustness against adversaries.
Taxonomy of Risks Posed by Language Models - FAccT ’22. This paper develops a comprehensive taxonomy of ethical and social risks associated with large-scale language models (LMs). It identifies twenty-one risks and categorizes them into six risk areas to guide responsible innovation and mitigation strategies.
To Believe or Not to Believe Your LLM - arXiv, 2024. This paper explores uncertainty quantification in LLMs to detect hallucinations by distinguishing epistemic from aleatoric uncertainties using an information-theoretic metric.
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models - arXiv, 2024. CARES evaluates the trustworthiness of medical vision language models (Med-LVLMs) across five dimensions: trustfulness, fairness, safety, privacy, and robustness.
Air Gap: Protecting Privacy-Conscious Conversational Agents - arXiv 2024. A paper from Google proposes AirGapAgent to prevent data leakage from LLMs, ensuring privacy in adversarial contexts.

Accountability

Human Control and Interaction

Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies - FAccT '23. This survey reviews over 100 empirical studies to understand and improve human-AI decision-making, emphasizing the need for unified research frameworks.

Legal and Ethical Guidelines

How to Use

Each summary is stored in the relevant subfolder within the summaries/ directory. You can browse through the summaries to quickly understand the main points of various papers.

Contributing

We welcome contributions! Please read our CONTRIBUTING.md file for more details on how to contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
summaries		summaries
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
update_readme.py		update_readme.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Responsible AI Paper Summaries

Overview

Recent Summaries

Summaries by Topic

How to Use

Contributing

About

Releases

Packages

Contributors 2

Languages

AIResponsibly/PaperSummaries

Folders and files

Latest commit

History

Repository files navigation

Responsible AI Paper Summaries

Overview

Recent Summaries

Summaries by Topic

How to Use

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages