-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to set the chunk size used by the thread pool #21
Comments
I needed something similar but with the underlying This worked for my needs: import functools
import p_tqdm.p_tqdm as p_tqdm
from p_tqdm import p_map
def monkeypatch() -> None:
p_tqdm.Pool = functools.partial(p_tqdm.Pool, maxtasksperchild=10)
monkeypatch()
results = p_map(_scrape_data_async, data_to_process, num_cpus=15) # this will use maxtasksperchild=10, can similarly provide In your case you could probably use something like this (haven't tested): import functools
import p_tqdm.p_tqdm as p_tqdm
from p_tqdm import p_map
def monkeypatch() -> None:
p_tqdm.Pool.map = functools.partial(p_tqdm.Pool.map, chunksize=15) # whatever chunksize you want
monkeypatch()
results = p_map(_scrape_data_async, data_to_process, num_cpus=15)``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There should be an option to allow the caller to provide the chunk size used by the thread pool created by
p_tqdm._parallel
. Using the default can be quite inefficient, especially when the caller knows that each of the operations inside the map is usually quite fast.Rationale:
https://medium.com/@rvprasad/data-and-chunk-sizes-matter-when-using-multiprocessing-pool-map-in-python-5023c96875ef
The text was updated successfully, but these errors were encountered: