Skip to content

ducnh279/Align-LLMs-with-DPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Align LLMs with DPO

Overview

This repository contains an implementation of Direct Preference Optimization (DPO) loss from scratch. The model is fine-tuned on 500 training examples, sampled from the UltraFeedback Binarized preference dataset. The training process is executed using a vanilla PyTorch training loop.

Model Training

  • When tuning hyperparameters, it's crucial to remember that the effectiveness of certain values may vary depending on your specific setup. For example, in my setup, a beta value of 0.1 yields better results compared to 0.2. Similarly, setting weight decay to a lower value and ensuring that the longer the sequence length, the better. I suggest treating these values as initial benchmarks and fine-tuning them to align with your setup for optimal results.

The DPO model is trained with the following config:

  • Training Parameters:

    • Epochs: 1
    • Batch Size: 1
    • Gradient Accumulation Steps: 2
    • Learning Rate: 1e-7
    • Learning Rate Decay: Cosine
    • Weight Decay: 1e-2
  • DPO Loss Parameter:

    • Beta: 0.1

Useful Links

Title Link Description
DPO paper PDF The original paper introducing Direct Preference Optimization (DPO), providing in-depth details about the algorithm.
H2O-Danube-1.8B Technical Report PDF This technical report presents the H2O-Danube-1.8B model and its application, including how they align the model using DPO
Sebastian's newsletter on iterative DPO Newsletter Sebastian Raschka's newsletter discusses iterative DPO and its practical implications, offering insights into its effectiveness in real-world scenarios.
Implementation of the DPO algorithm GitHub This GitHub repository provides an implementation of the DPO algorithm, allowing you to directly access and utilize the codebase for your projects.
Video explains DPO YouTube This video offers a visual and auditory explanation of DPO, which can be helpful for those who prefer seeking a simplified overview of the concept.

About

Align a Large Language Model (LLM) with DPO loss

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published