Training with UbuntuCorpusTrainer fails #2341

AlkisPis · 2023-12-05T22:23:43Z

It seems I can't train the chatbot with the UbuntuCorpusTrainer.
I tried multiple times and, after the first time, in which the trainer downloaded the TGZ file and extracted the TSV files, from thereon I was always receiving the following info:
INFO:chatterbot.chatterbot:File is already downloaded
INFO:chatterbot.chatterbot:File is already extracted
Then the script was stopped responding.

Questions:

What could the problem be?
The file 'Ubuntu_dialogs.tgz' , which I managed to download myself, contains thousands of TSV files. Where have they been extracted to or converted to ands stored as YML? They can't be found under the 'chatterbot_corpus' folder or anywhere else.

AlkisPis · 2023-12-06T13:00:59Z

I have debugged the training process of UbuntuCorpusTrainer and found out that it had extracted the TSV files into 'C:\Users\user\ubuntu_data\ubuntu_dialogs\dialogs folder' (in Windows) . This is totally unacceptable, i.e. using a folder in the main disk of the user instead of the folder in which ChatterBot has been created, as with the other corpus data! And it is more unacceptable if one has installed ChatterBot in movable disk, like a flash drive. Because there is a very
obvious reason why someone has chosen this a kind of installation: to be able to be used as stand-alone!
Then I realized that the trainer tries to create a DB from exactly 23251 files stored in that folder. And of course the process gets "stuck" and it looks like the script has crashed. No one can know why until one debugs the training process!
Totally UNACCEPTABLE! Both the method of extracting the files and the never-ending training of the ChatBot with such a kind of Trainer, esp. considering that it does not even accept a pathspec, as e.g. ChatterBotCorpusTrainer with which you can use even a single YAML file for training.

Any comments are welcome. I will keep this issue open just for a few days.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with UbuntuCorpusTrainer fails #2341

Training with UbuntuCorpusTrainer fails #2341

AlkisPis commented Dec 5, 2023 •

edited

AlkisPis commented Dec 6, 2023 •

edited

Training with UbuntuCorpusTrainer fails #2341

Training with UbuntuCorpusTrainer fails #2341

Comments

AlkisPis commented Dec 5, 2023 • edited

AlkisPis commented Dec 6, 2023 • edited

AlkisPis commented Dec 5, 2023 •

edited

AlkisPis commented Dec 6, 2023 •

edited