add_doc() method very slow #199

erima2020 · 2023-03-10T10:19:25Z

Hello,
Since I wrote the original message, I could find the issue (I had two columns in my data file, which slowed things down). Now I have the same issue of slowness, but less slow: in 5 minutes, I can import around 100,000 documents of several millions (currently trying with a subsample). My data file is simply one text per line. Can things occur faster?
(I'm on a i7 Mac here, 32GB Ram, 1TB SDD).

I also have a question regarding model evaluation with different k1 and k2 values with Pachinko TM: can this be achieved (e.g., like optimizing coherence with LDA)

Thank you in advance for your prompt answer!

Cheers,
Eric

Original message:
Hello,
I tried to use tomotopy with a few thousand tweets and it went quick on google colab. Now I am trying with several millions on a local machine, and the add_doc() method seems to be a bottleneck. In five minutes, it added around 50 documents. Is this a known issue? Do I need to configure something to get it run quicker?
Best wishes,
Eric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_doc() method very slow #199

add_doc() method very slow #199

erima2020 commented Mar 10, 2023 •

edited

Loading

add_doc() method very slow #199

add_doc() method very slow #199

Comments

erima2020 commented Mar 10, 2023 • edited Loading

erima2020 commented Mar 10, 2023 •

edited

Loading