This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练,经常提到的rlhf指的是哪几种训练方法 #6580
Labels
wontfix
This will not be worked on
Description
增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练,经常提到的rlhf指的是哪几种训练方法
Pull Request
增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练,经常提到的rlhf指的是哪几种训练方法
The text was updated successfully, but these errors were encountered: