ROLEBENCH

ROLEBENCH is a framework for evaluating the performance of Role-Prompting across different datasets and Large Language Models.

Have a quick run 🏃

Supported models

Llama3-8B Instruct
Phi-3 mini-4K Instruct
Mistral-7B Instruct
Gemma-7B Instruct

Datasets

BoolQ (validation split - 3270 samples)
COMMONSENSEQA (validation split - 1221 samples)
iwslt2017-en-fr dataset (validation split - 890 samples)
samsum dataset (test split - 819 samples)

Prompt Template

BoolQ - Based on the passage:'{passage}'\nAnswer True/False to the question: '{question}' as an Omniscient person.

COMMONSENSEQA - Choose the answer as a critical thinker.\n{question}\n{opt1}. {text1}\n{opt2}. {text2}\n{opt3}. {text3}\n{opt4}. {text4}\n{opt5}. {text5}

IWSLT2017en-fr - Translate '{eng_text}' to french as a Translator.

SamSum - Summarise the Dialogue: {dialogue} as a Storyteller.

Results

Model	BoolQ	COMMONSENSEQA	IWSLT2017en-fr	SamSum
Llama3	Accuracy = 0.8507 F1 score = 0.8793	Accuracy = 0.7371	BLEU = 0.2399 METEOR = 0.5436	Rouge1 = 0.1725 RougeL = 0.1229
Llama3
Phi-3	Accuracy = 0.8113 F1 score = 0.8344	Accuracy = 0.7068	BLEU = 0.1928 METEOR = 0.4950	Rouge1 = 0.1383 RougeL = 0.0951
Phi-3
Mistral-7B	Accuracy = 0.8281 F1 score = 0.8548	Accuracy = 0.6490	BLEU = 0.1507 METEOR = 0.4763	Rouge1 = 0.1359 RougeL = 0.0991
Mistral-7B
Gemma-7B	Accuracy = 0.6288 F1 score = 0.5831	Accuracy = 0.6288	BLEU = 0.0940 METEOR = 0.3611	Rouge1 = 0.1192 RougeL = 0.0793
Gemma-7B

Repository Structure

_llama3_role_all.ipynb -- Role prompting on all datasets using Llama3-8B Instruct model
|
|_phi3_role_all.ipynb -- Role prompting on all datasets using Phi-3 mini-4K Instruct model
|
|_mistral_role_all.ipynb -- Role prompting on all datasets using Mistral-7B Instruct model
|
|_Gemma_role_all.ipynb  -- Role prompting on all datasets using Gemma-7B Instruct model
|
|_Role_prompting____quantitaive_analysis.txt 
                  |_qualitative_analysis.txt

Contribution

The project will always remain OPEN-SOURCE, further contributions involving new models and datasets, formulating new roles in the prompt templates are always welcome.

References

if you find this work useful, please cite this repository:

@software{Budagam_ROLEBENCH-_A_Role_2024,
author = {Budagam, Devichand},
month = may,
title = {{ROLEBENCH- A Role Prompting Benchmark}},
url = {https://github.com/devichand579/ROLEBENCH},
year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Role_prompting		Role_prompting
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Gemma_role_all.ipynb		Gemma_role_all.ipynb
LICENSE		LICENSE
README.md		README.md
llama3_role_all.ipynb		llama3_role_all.ipynb
mistral_role_all.ipynb		mistral_role_all.ipynb
phi3_role_all.ipynb		phi3_role_all.ipynb
repository_details.txt		repository_details.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROLEBENCH

Supported models

Datasets

Prompt Template

Results

Repository Structure

Contribution

References

About

Releases

Packages

Languages

License

devichand579/ROLEBENCH

Folders and files

Latest commit

History

Repository files navigation

ROLEBENCH

Supported models

Datasets

Prompt Template

Results

Repository Structure

Contribution

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages