增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练，经常提到的rlhf指的是哪几种训练方法 #6580

qkkcoolmax · 2025-01-09T09:19:29Z

Description

增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练，经常提到的rlhf指的是哪几种训练方法

Pull Request

增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练，经常提到的rlhf指的是哪几种训练方法

qkkcoolmax added the enhancement New feature or request label Jan 9, 2025

github-actions bot added the pending This problem is yet to be addressed label Jan 9, 2025

Repository owner locked and limited conversation to collaborators Jan 9, 2025

hiyouga converted this issue into discussion #6583 Jan 9, 2025

hiyouga added wontfix This will not be worked on and removed enhancement New feature or request pending This problem is yet to be addressed labels Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练，经常提到的rlhf指的是哪几种训练方法 #6580

增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练，经常提到的rlhf指的是哪几种训练方法 #6580

qkkcoolmax commented Jan 9, 2025

This issue was moved to a discussion.

This issue was moved to a discussion.

增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练，经常提到的rlhf指的是哪几种训练方法 #6580

增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练，经常提到的rlhf指的是哪几种训练方法 #6580

Comments

qkkcoolmax commented Jan 9, 2025

Description

Pull Request

This issue was moved to a discussion.