Fine-Tuned Llama-2 For Machine Translation

In this repository, I store the code for Fine Tuning Meta's Llama-2 Model for Neural Machine Translation From Bengali to English Language

Task

Chosen Task:
Neural Machine Translation (NMT) from Bengali to English Language

Why I choose it?
I have been working with Neural Machine Translation for a while. For my research purpose, I am exploring different Machine Translation model. I know that we can fine tune any LLM for doing a specific task. As, I am working with Machine Translation, I want to see the performance of LLM for Machine Translation, I also have a Good Dataset. So, I think it will be a good choice for me to work on this task.

Dataset

Base Dataset: BUET-BanglaNMT Dataset(2.5 Million)
Preprocessed Dataset: Preprocessed Dataset(2.1 Million)
< Small Dataset: Small Dataset(200k)

Why I choose this dataset?
This is one of the largest Bengali to English parallel corpus available. I have format the dataset for my task, according to model. I have started working with large dataset. But for low resource and time, I have also created a small dataset for my task and fine tune the model with that dataset.

I have used the BUET-BanglaNMT Dataset from HuggingFace. It contains around 2.5 million pairs of Bengali and English sentences.

Model

I have used the Meta's Llama-2 Model from Meta. This is my fine-tuned adapter: Fine-Tuned Llama-2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
data_preprocessing.ipynb		data_preprocessing.ipynb
fine_tuning_and_testing.ipynb		fine_tuning_and_testing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

data_preprocessing.ipynb

data_preprocessing.ipynb

fine_tuning_and_testing.ipynb

fine_tuning_and_testing.ipynb

Repository files navigation

Fine-Tuned Llama-2 For Machine Translation

Task

Dataset

Model

About

Releases

Packages

Languages

MusfiqDehan/Llama2-Finetuned-for-Translation

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuned Llama-2 For Machine Translation

Task

Dataset

Model

About

Resources

Stars

Watchers

Forks

Languages