Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to Grossmend's model #1

Open
nikich340 opened this issue Feb 1, 2023 · 3 comments
Open

Link to Grossmend's model #1

nikich340 opened this issue Feb 1, 2023 · 3 comments

Comments

@nikich340
Copy link

Hello. It seems that original Grossmend's model was deleted: https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2
Do you still have that model by a chance?

@Kirili4ik
Copy link
Owner

Oh, that's unfortunate..
If you are looking for a model in Russian you can try these ones:
https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2
https://huggingface.co/tinkoff-ai/ruDialoGPT-medium
Or search here https://huggingface.co/models.

I don't have a copy of the Grossmend's model on me =(

@nikich340
Copy link
Author

I see, thank you!
Do you recommend using rugpt-3 trained on dialog data (one of mentioned by you, for example) or start fine-tuning from "vanilla" sber rugpt3-medium/small, if my dataset is quite small (1500 pairs "context - answer") ?

@Kirili4ik
Copy link
Owner

Kirili4ik commented Feb 6, 2023

If you have pairs context-answer you should look into training models like T5/mT5/flan-T5/ru-T5 or other text-2-text models instead of just generators like GPT-2/3
But generally it's better to go with the model that is trained on the closest data to yours (if the model is trained well and not overfitted)

Sorry for such a slow reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants