Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p_map() very slow compared to multiprocess.Pool.map() #40

Open
FlorinAndrei opened this issue Sep 6, 2021 · 3 comments
Open

p_map() very slow compared to multiprocess.Pool.map() #40

FlorinAndrei opened this issue Sep 6, 2021 · 3 comments

Comments

@FlorinAndrei
Copy link

I'm trying to accelerate Pandas df.apply(), and also get a progress bar. The problem is, p_map is orders of magnitude slower than plain multiprocess.Pool.map() for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer().

This notebook is self-explanatory:

https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb

p_map() is orders of magnitude slower.

However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.

Windows 10, Python 3.8.8, Jupyter Notebook

@nuttyartist
Copy link

From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this?

@AeroTH310
Copy link

I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though.

@BenjaminHoegh
Copy link

Also seems to very slow compared to joblib's parallel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants