-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p_map() very slow compared to multiprocess.Pool.map() #40
Comments
From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this? |
I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though. |
Also seems to very slow compared to joblib's parallel |
I'm trying to accelerate Pandas
df.apply()
, and also get a progress bar. The problem is,p_map
is orders of magnitude slower than plainmultiprocess.Pool.map()
for a job where most of the processing is done bynltk.sentiment.vader.SentimentIntensityAnalyzer()
.This notebook is self-explanatory:
https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb
p_map()
is orders of magnitude slower.However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.
Windows 10, Python 3.8.8, Jupyter Notebook
The text was updated successfully, but these errors were encountered: