Use multiprocess if available #368

tbekolay · 2020-08-04T17:40:01Z

Windows doesn't have a true fork implementation, so multiprocessing and pickling are limited compared to other platforms. There are also some situations in Linux / Mac OS X where we would like to pickle something, but it does not work without some code changes. This PR makes pickling for mulitprocessing work in more situations by using the multiprocess library, if it's available.multiprocess is a fork of multiprocessing that uses dill instead of pickle.

This PR is set up such that we do not require the user to have multiprocess installed, but if it is installed we'll use it -- except in Windows, where pickling is so limited that it will make things easier to require it. There are a few other routes we could go:

Don't require it in Windows: This makes things simpler for Windows users who don't want to use multiprocessing. Installing the multiprocess package should work on all platforms, but if there's a platform where there are likely to be issues, it's Windows.
Require it for all platforms: This will likely be a net benefit to everyone, but it introduces a new dependency. Also, dill will always be a little bit slower than pickle since it's able to pickle more things. However, if we're not pickling anything too large anyway, performance shouldn't be an issue, and it would be possible to simplify some parts of the codebase by using dill instead of cloudpickle and removing some of the hacks that we currently use to get Windows to pickle things that it normally can't.

I'm happy to modify this PR to implement either option if you want! I've been using doit for a relatively large data science project and it's been a huge help.

schettino72 · 2020-08-05T05:45:11Z

Thanks.

On runner.py MRunner there is a check if multiprocessing is available. That should be taken into account...
In that spirit I guess it would be better to NOT make multiprocess required on Windows.
Also it would be nice to consolidate try/catch of the imports. Maybe on "compat.py" file.
Thanks for taking care of CHANGES file. Please also mention this in the "install.rst" file.
Finally, need to think how to deal this on CI.

schettino72 · 2020-08-05T05:48:16Z

Require it for all platforms: This will likely be a net benefit to everyone

could you please elaborate

Kwpolska · 2020-08-05T09:51:14Z

It will also be useful on macOS, where spawn became the default in 3.8, and fork doesn’t always work properly.

schettino72 · 2020-09-05T03:59:27Z

uqfoundation/multiprocess#65

It seems multiprocess package forced a "regression" because one of the features added (Pool) does not work well with "spawn" method. This leads me to think that multiprocess could be accepted as a multiprocessing replacement but should not be installed by default even for Windows/MAC.

tbekolay · 2021-11-16T20:22:24Z

Hi @schettino72, since I was touching the doit code for #377 I updated this one also to take your feedback into account.

On runner.py MRunner there is a check if multiprocessing is available. That should be taken into account...

Done, I implemented the check in compat.py and call it from MRunner now.

In that spirit I guess it would be better to NOT make multiprocess required on Windows.

Yes, that makes sense. I instead made it an optional dependency that you can install with pip install doit[multiprocess].

Also it would be nice to consolidate try/catch of the imports. Maybe on "compat.py" file.

Done, the imports are done in compat.py now.

Thanks for taking care of CHANGES file. Please also mention this in the "install.rst" file.

Done!

Finally, need to think how to deal this on CI.

I didn't add anything to CI, but you could either modify one of the existing builds in your build matrix to install with multiprocess, by changing

      - run: pip install . -r dev_requirements.txt

to

      - run: pip install .[multiprocess] -r dev_requirements.txt

for that build (might need quotes, like pip install '.[multiprocess]', not sure what kind of shell Github actions uses). Or you could add a new build to the matrix that does the above with the latest version of Python. Those are my ideas anyhow; running the tests locally with multiprocess works correctly.

If you like, I can try to add a test that would fail with multiprocessing but succeed with multiprocess.

schettino72 · 2021-11-18T07:58:09Z

doit/compat.py

+    from multiprocess import Process, Queue as MQueue
+    HAS_MULTIPROCESS = True
+except ImportError:
+    from multiprocessing import Process, Queue as MQueue


Now this will break for BSD users without any of them installed.

schettino72 · 2021-11-18T08:02:20Z

doit/compat.py

+except ImportError:
+    from multiprocessing import Process, Queue as MQueue
+    HAS_MULTIPROCESS = False
+Process # pyflakes


I dont think this should be top level.

schettino72

So I would suggest that instead of having two copies of the import try/except we have only one version. get_multiprocess_lib().

If None that just means "not available".

And it should take an optional parameter start_method, if set you call:

multiprocess.set_start_method('spawn')

If the problem could be solved just by changing the "start_method" we could add a configuration parameter for that (not required on this PR).

schettino72 · 2021-11-18T08:08:12Z

If you like, I can try to add a test that would fail with multiprocessing but succeed with multiprocess.

That would be great.

schettino72 mentioned this pull request Sep 4, 2020

doit auto crashes with python 3.8 on MacOS #372

Open

tbekolay force-pushed the use-multiprocess branch from 67c5c4e to bd82a09 Compare November 16, 2021 20:02

Use multiprocess if available

909db9b

tbekolay force-pushed the use-multiprocess branch from bd82a09 to 909db9b Compare November 16, 2021 20:14

schettino72 reviewed Nov 18, 2021

View reviewed changes

gerilya mentioned this pull request May 22, 2022

Can't pickle local object 'main.<locals>.grpc_prediction_server' SeldonIO/seldon-core#3410

Closed

jguillon mentioned this pull request Nov 25, 2022

fix: use multiprocess instead of multiprocessing on macOS pydoit/doit-auto1#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiprocess if available #368

Use multiprocess if available #368

tbekolay commented Aug 4, 2020

schettino72 commented Aug 5, 2020

schettino72 commented Aug 5, 2020

Kwpolska commented Aug 5, 2020

schettino72 commented Sep 5, 2020 •

edited

Loading

tbekolay commented Nov 16, 2021

schettino72 Nov 18, 2021

schettino72 Nov 18, 2021

schettino72 left a comment

schettino72 commented Nov 18, 2021

Use multiprocess if available #368

Are you sure you want to change the base?

Use multiprocess if available #368

Conversation

tbekolay commented Aug 4, 2020

schettino72 commented Aug 5, 2020

schettino72 commented Aug 5, 2020

Kwpolska commented Aug 5, 2020

schettino72 commented Sep 5, 2020 • edited Loading

tbekolay commented Nov 16, 2021

schettino72 Nov 18, 2021

Choose a reason for hiding this comment

schettino72 Nov 18, 2021

Choose a reason for hiding this comment

schettino72 left a comment

Choose a reason for hiding this comment

schettino72 commented Nov 18, 2021

schettino72 commented Sep 5, 2020 •

edited

Loading