p_map() very slow compared to multiprocess.Pool.map() #40

FlorinAndrei · 2021-09-06T22:15:43Z

I'm trying to accelerate Pandas df.apply(), and also get a progress bar. The problem is, p_map is orders of magnitude slower than plain multiprocess.Pool.map() for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer().

This notebook is self-explanatory:

https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb

p_map() is orders of magnitude slower.

However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.

Windows 10, Python 3.8.8, Jupyter Notebook

The text was updated successfully, but these errors were encountered:

nuttyartist · 2023-01-01T17:37:42Z

From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this?

AeroTH310 · 2023-04-05T22:59:16Z

I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though.

BenjaminHoegh · 2023-09-05T11:15:25Z

Also seems to very slow compared to joblib's parallel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p_map() very slow compared to multiprocess.Pool.map() #40

p_map() very slow compared to multiprocess.Pool.map() #40

FlorinAndrei commented Sep 6, 2021

nuttyartist commented Jan 1, 2023

AeroTH310 commented Apr 5, 2023

BenjaminHoegh commented Sep 5, 2023

p_map() very slow compared to multiprocess.Pool.map() #40

p_map() very slow compared to multiprocess.Pool.map() #40

Comments

FlorinAndrei commented Sep 6, 2021

nuttyartist commented Jan 1, 2023

AeroTH310 commented Apr 5, 2023

BenjaminHoegh commented Sep 5, 2023