LLM Attributor: Attribute LLM's Generated Text to Training Data

LLM Attributor helps you visualize training data attribution of text generation of your large language models (LLMs). Interactively select text phrases and visualize the training data points responsible for generating the selected phrases. Easily modify model-generated text and observe how your changes affect the attribution with a visualized side-by-side comparison.


🎬 Demo YouTube Video	✍️ Technical Report

Feature Highlights

llm-attributor.mp4

Getting Started

Installation

LLM Attributor is published in the Python Package Index (PyPI) repository. To install LLM Attributor, you can use pip:

pip install llm-attributor

Initialization

You can import LLM Attributor to your computational notebooks (e.g., Jupyter Notebook/Lab) and initialize your model and data configurations.

from LLMAttributor import LLMAttributor
attributor = LLMAttributor(
    llama2_dir=LLAMA2_DIR,
    tokenizer_dir=TOKENIZER_DIR,
    model_save_dir=MODEL_SAVE_DIR,
    train_dataset=TRAIN_DATASET
)

For the LLAMA2_DIR and TOKENIZER_DIR, you can input the path to the base LLaMA2 model. These are necessary when your model is not fine-tuned yet. MODEL_SAVE_DIR is the directory where your fine-tuned model is (or will be saved).

Demo

You can try disaster-demo.ipynb and finance-demo.ipynb to try interactive visualization of LLM Attributor.

Credits

LLM Attributor is created by Seongmin Lee, Jay Wang, Aishwarya Chakravarthy, Alec Helbling, Anthony Peng, Mansi Phute, Polo Chau, and Minsuk Kahng.

License

The software is available under the MIT License.

Contact

If you have any questions, feel free to open an issue or contact Seongmin Lee.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
LLMAttributor		LLMAttributor
assets		assets
data		data
scores		scores
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
disaster-demo.ipynb		disaster-demo.ipynb
finance-demo.ipynb		finance-demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Attributor: Attribute LLM's Generated Text to Training Data

Feature Highlights

Getting Started

Installation

Initialization

Demo

Credits

License

Contact

About

Releases

Packages

Contributors 3

Languages

License

poloclub/LLM-Attributor

Folders and files

Latest commit

History

Repository files navigation

LLM Attributor: Attribute LLM's Generated Text to Training Data

Feature Highlights

Getting Started

Installation

Initialization

Demo

Credits

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages