Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments in KTO Trainer forward() #17

Open
samuelzxu opened this issue May 5, 2024 · 1 comment
Open

Comments in KTO Trainer forward() #17

samuelzxu opened this issue May 5, 2024 · 1 comment

Comments

@samuelzxu
Copy link
Contributor

samuelzxu commented May 5, 2024

Hi there,

I'm reading through the forward() function in KTO Trainer, and in the function signature it states that if read in correctly, the sizes of chosen and rejected logps should be batch_size/2. However, this doesn't make sense to me because this sounds like a limitation for Paired preference training rather than the unpaired training method of kto.

Here's comment from lines 875-877 of trainers.py:

chosen_logps: log probabilities of chosen examples (should be batch size / 2 if data was read in correctly)
rejected_logps: log probabilities of rejected examples (should be batch size / 2 if data was read in correctly)
KL_logps: log probabilities of the unmatched y'|x (used to estimate the KL divergence between policy and reference; should be batch size)

Please let me know if this makes sense, Im happy to open a PR.

@kawine
Copy link
Collaborator

kawine commented May 6, 2024

You're correct! The comment is from when i was trying to debug the code during development and is outdated. Feel free to open a PR and i'll merge it in. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants