You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Since I wrote the original message, I could find the issue (I had two columns in my data file, which slowed things down). Now I have the same issue of slowness, but less slow: in 5 minutes, I can import around 100,000 documents of several millions (currently trying with a subsample). My data file is simply one text per line. Can things occur faster?
(I'm on a i7 Mac here, 32GB Ram, 1TB SDD).
I also have a question regarding model evaluation with different k1 and k2 values with Pachinko TM: can this be achieved (e.g., like optimizing coherence with LDA)
Thank you in advance for your prompt answer!
Cheers,
Eric
Original message:
Hello,
I tried to use tomotopy with a few thousand tweets and it went quick on google colab. Now I am trying with several millions on a local machine, and the add_doc() method seems to be a bottleneck. In five minutes, it added around 50 documents. Is this a known issue? Do I need to configure something to get it run quicker?
Best wishes,
Eric
The text was updated successfully, but these errors were encountered:
Hello,
Since I wrote the original message, I could find the issue (I had two columns in my data file, which slowed things down). Now I have the same issue of slowness, but less slow: in 5 minutes, I can import around 100,000 documents of several millions (currently trying with a subsample). My data file is simply one text per line. Can things occur faster?
(I'm on a i7 Mac here, 32GB Ram, 1TB SDD).
I also have a question regarding model evaluation with different k1 and k2 values with Pachinko TM: can this be achieved (e.g., like optimizing coherence with LDA)
Thank you in advance for your prompt answer!
Cheers,
Eric
Original message:
Hello,
I tried to use tomotopy with a few thousand tweets and it went quick on google colab. Now I am trying with several millions on a local machine, and the add_doc() method seems to be a bottleneck. In five minutes, it added around 50 documents. Is this a known issue? Do I need to configure something to get it run quicker?
Best wishes,
Eric
The text was updated successfully, but these errors were encountered: