Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练,经常提到的rlhf指的是哪几种训练方法 #6580

Closed
qkkcoolmax opened this issue Jan 9, 2025 · 0 comments
Labels
wontfix This will not be worked on

Comments

@qkkcoolmax
Copy link

Description

增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练,经常提到的rlhf指的是哪几种训练方法

Pull Request

增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练,经常提到的rlhf指的是哪几种训练方法

@qkkcoolmax qkkcoolmax added the enhancement New feature or request label Jan 9, 2025
@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 9, 2025
Repository owner locked and limited conversation to collaborators Jan 9, 2025
@hiyouga hiyouga converted this issue into discussion #6583 Jan 9, 2025
@hiyouga hiyouga added wontfix This will not be worked on and removed enhancement New feature or request pending This problem is yet to be addressed labels Jan 9, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants