-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于人类偏好模型的训练 #47
Comments
是的,我们目前还没有使用强化学习用于我们的模型训练中,人类偏好模型目前仅用于模型回答的筛选。 |
好的,感谢您的回答 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
您好,看到论文里写的最后的对比训练用的是,一个线性层做的一个打分排序模型?请问这一步是不是没有用的强化学习
The text was updated successfully, but these errors were encountered: