Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with UbuntuCorpusTrainer fails #2341

Open
AlkisPis opened this issue Dec 5, 2023 · 1 comment
Open

Training with UbuntuCorpusTrainer fails #2341

AlkisPis opened this issue Dec 5, 2023 · 1 comment

Comments

@AlkisPis
Copy link

AlkisPis commented Dec 5, 2023

It seems I can't train the chatbot with the UbuntuCorpusTrainer.
I tried multiple times and, after the first time, in which the trainer downloaded the TGZ file and extracted the TSV files, from thereon I was always receiving the following info:
INFO:chatterbot.chatterbot:File is already downloaded
INFO:chatterbot.chatterbot:File is already extracted
Then the script was stopped responding.

Questions:

  1. What could the problem be?
  2. The file 'Ubuntu_dialogs.tgz' , which I managed to download myself, contains thousands of TSV files. Where have they been extracted to or converted to ands stored as YML? They can't be found under the 'chatterbot_corpus' folder or anywhere else.
@AlkisPis
Copy link
Author

AlkisPis commented Dec 6, 2023

I have debugged the training process of UbuntuCorpusTrainer and found out that it had extracted the TSV files into 'C:\Users\user\ubuntu_data\ubuntu_dialogs\dialogs folder' (in Windows) . This is totally unacceptable, i.e. using a folder in the main disk of the user instead of the folder in which ChatterBot has been created, as with the other corpus data! And it is more unacceptable if one has installed ChatterBot in movable disk, like a flash drive. Because there is a very
obvious reason why someone has chosen this a kind of installation: to be able to be used as stand-alone!
Then I realized that the trainer tries to create a DB from exactly 23251 files stored in that folder. And of course the process gets "stuck" and it looks like the script has crashed. No one can know why until one debugs the training process!
Totally UNACCEPTABLE! Both the method of extracting the files and the never-ending training of the ChatBot with such a kind of Trainer, esp. considering that it does not even accept a pathspec, as e.g. ChatterBotCorpusTrainer with which you can use even a single YAML file for training.

Any comments are welcome. I will keep this issue open just for a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant