New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者您好,请教问题:tokenizer词表大小和模型embedding层对应不上 #39
Comments
已经修复,可以重新加载下 |
您好,感谢您的回复。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
作者您好,感谢您分享模型。之前问过您问题如何预训练。
我发现加载模型后embedding层大小是31128但是加载tokenzier分词器词表大小32228.原因就是多了预训练需要的extra_0到extra_100.而这是预训练所需要的。所以如何基于您分享这个embedding的32128的模型预训练。
tokenizer的
model的
The text was updated successfully, but these errors were encountered: