MWE-Stable-Diffusion-Interpretation

Overview

This repository contains the dataset, code, and outputs related to the thesis "Interpreting Multi-Word Expressions in Text-to-Image Generation: A Cross-Attention Approach Using Stable Diffusion." The work explores how generative models like Stable Diffusion 2.1 handle Multi-Word Expressions (MWEs) and uses interpretability tools such as DAAM (Diffusion Attention Attribution Maps) to analyze model attention.

Introduction

Multi-Word Expressions (MWEs) encompass idioms, collocations, and phrases that convey meanings beyond their literal components. They are central to natural language but pose challenges for generative models due to their context-dependent and often abstract nature. This repository:

Explores the semantic and visual representation of MWEs.
Leverages Stable Diffusion 2.1 for generating images from text prompts.
Uses DAAM to analyze and visualize attention distribution across the generated images.

Dataset Description

File: `mwe_with_ai_prompts_sorted.xlsx`

This dataset is an extended version of the MWE-CWI dataset. It includes:

MWEs: Multi-word expressions such as "spill the beans" and "hard pressed."
Context: Sentences providing linguistic and situational context for each MWE.
Generated Prompts: Text prompts crafted for Stable Diffusion to represent each MWE.
Annotations:
- Supersenses: High-level semantic categories for the MWEs.
- Complexity scores: Binary/probabilistic labels indicating difficulty for native and non-native speakers.

This dataset plays a pivotal role in:

Generating accurate textual prompts.
Evaluating semantic alignment between MWEs and their visual representations.

Code Description

File: `working_DAAM_thesis.ipynb`

The Jupyter Notebook contains all the code necessary for:

Prompt Engineering:
- Integrates MWEs naturally into descriptive and contextually rich prompts.
- Ensures prompts stay within Stable Diffusion’s token limit.
- Example: "A close-up of a person nervously spilling beans onto the floor, symbolizing the accidental revealing of a secret. The atmosphere is tense, and the background is a modern office."
Image Generation:
- Uses Stable Diffusion 2.1 to generate high-quality images based on the crafted prompts.
- Encodes prompts using CLIP and processes them through latent diffusion steps.
DAAM Visualizations:
- Generates heatmaps to visualize cross-attention scores.
- Maps heatmaps onto the image to highlight the model’s focus areas.
- Example outputs include heatmaps for "spill the beans" and "hard pressed."
Evaluation Framework:
- Metrics:
  - Intersection over Union (IoU): Measures alignment between heatmaps and annotated ground-truth masks.
  - Coverage Metrics: Quantifies how much attention is focused on relevant regions.

Outputs

Generated Images and Visualizations

Example 1: President Barack Obama

Visual Comparisons for the word: “President Barack Obama” Prompt Details • Genre: News • Complex Probabilistic Score: 0.45 • Expression: “President Barack Obama” • Context: “Outgoing US President Barack Obama authorised the move in response to Russian intervention in Ukraine in 2014, in which Crimea was annexed.” • Generated Prompt: “Outgoing US President Barack Obama stands in a solemn setting, authorizing a strategic response to Russia's 2014 intervention in Ukraine, where Crimea was annexed. The scene captures Obama's decisive expression amid a backdrop of geopolitical tension.”

DAAM Overlay for "President Barack Obama"

DAAM Heatmap for "President Barack Obama"

Mask for Comparison

Example 2: "Spill the Beans"

This example captures the idiomatic expression "Spill the Beans." The original image shows a person nervously spilling beans, symbolizing revealing a secret. The DAAM overlay highlights the model’s attention to the beans and the context of the action.

Original Image:

DAAM Heatmap Overlay:

Dependencies

Python Libraries

torch: PyTorch for Stable Diffusion.
transformers: Hugging Face library for model loading.
cv2: OpenCV for image processing.
matplotlib: Visualization of heatmaps and outputs.
pandas: Dataset handling.

Environment

Google Colab (recommended): Provides a GPU-powered environment for running the notebook.

Acknowledgments

This repository relies on the following resources:

MWE-CWI Dataset: A publicly available dataset for identifying and evaluating multi-word expressions (MWEs).
Stable Diffusion: A cutting-edge latent diffusion model by Stability AI and Hugging Face.
DAAM: A visualization tool for attention attribution in diffusion models.

License

This repository is licensed under the MIT License. See the LICENSE file for more details.

Setup and Usage

Clone the Repository

git clone https://github.com/your-username/MWE-Stable-Diffusion-Interpretation.git
cd MWE-Stable-Diffusion-Interpretation

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Attention map.png		Attention map.png
Barack Obama overlay.png		Barack Obama overlay.png
Images		Images
Mask		Mask
President Barack Obama Heatmap		President Barack Obama Heatmap
President Barack Obama.png		President Barack Obama.png
README.md		README.md
image_with_overlay.png		image_with_overlay.png
image_with_overlay.svg		image_with_overlay.svg
mwe_with_ai_prompts_sorted.xlsx		mwe_with_ai_prompts_sorted.xlsx
original_image.png		original_image.png
working_DAAM_thesis .ipynb		working_DAAM_thesis .ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MWE-Stable-Diffusion-Interpretation

Overview

Contents

Introduction

Dataset Description

File: `mwe_with_ai_prompts_sorted.xlsx`

Code Description

File: `working_DAAM_thesis.ipynb`

Outputs

Generated Images and Visualizations

Example 1: President Barack Obama

DAAM Overlay for "President Barack Obama"

DAAM Heatmap for "President Barack Obama"

Mask for Comparison

Example 2: "Spill the Beans"

Original Image:

DAAM Heatmap Overlay:

Dependencies

Python Libraries

Environment

Acknowledgments

License

Setup and Usage

Clone the Repository

About

Releases

Packages

Languages

Sayefalikhan03/MWE-Stable-Diffusion-Interpretation

Folders and files

Latest commit

History

Repository files navigation

MWE-Stable-Diffusion-Interpretation

Overview

Contents

Introduction

Dataset Description

File: mwe_with_ai_prompts_sorted.xlsx

Code Description

File: working_DAAM_thesis.ipynb

Outputs

Generated Images and Visualizations

Example 1: President Barack Obama

DAAM Overlay for "President Barack Obama"

DAAM Heatmap for "President Barack Obama"

Mask for Comparison

Example 2: "Spill the Beans"

Original Image:

DAAM Heatmap Overlay:

Dependencies

Python Libraries

Environment

Acknowledgments

License

Setup and Usage

Clone the Repository

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

File: `mwe_with_ai_prompts_sorted.xlsx`

File: `working_DAAM_thesis.ipynb`

Packages