Comments in KTO Trainer `forward()` #17

samuelzxu · 2024-05-05T20:35:10Z

Hi there,

I'm reading through the forward() function in KTO Trainer, and in the function signature it states that if read in correctly, the sizes of chosen and rejected logps should be batch_size/2. However, this doesn't make sense to me because this sounds like a limitation for Paired preference training rather than the unpaired training method of kto.

Here's comment from lines 875-877 of trainers.py:

chosen_logps: log probabilities of chosen examples (should be batch size / 2 if data was read in correctly)
rejected_logps: log probabilities of rejected examples (should be batch size / 2 if data was read in correctly)
KL_logps: log probabilities of the unmatched y'|x (used to estimate the KL divergence between policy and reference; should be batch size)

Please let me know if this makes sense, Im happy to open a PR.

The text was updated successfully, but these errors were encountered:

kawine · 2024-05-06T00:59:39Z

You're correct! The comment is from when i was trying to debug the code during development and is outdated. Feel free to open a PR and i'll merge it in. Thanks!

samuelzxu mentioned this issue May 24, 2024

Removed outdated comments #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments in KTO Trainer `forward()` #17

Comments in KTO Trainer `forward()` #17

samuelzxu commented May 5, 2024 •

edited

kawine commented May 6, 2024

Comments in KTO Trainer forward() #17

Comments in KTO Trainer forward() #17

Comments

samuelzxu commented May 5, 2024 • edited

kawine commented May 6, 2024

Comments in KTO Trainer `forward()` #17

Comments in KTO Trainer `forward()` #17

samuelzxu commented May 5, 2024 •

edited