Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy_parallellize having trouble with function context? #164

Open
larroy opened this issue Oct 26, 2021 · 3 comments
Open

lazy_parallellize having trouble with function context? #164

larroy opened this issue Oct 26, 2021 · 3 comments

Comments

@larroy
Copy link

larroy commented Oct 26, 2021

I'm using a function defined in the current file in pseq, and seems it errors out not being able to find other referenced functions or even simple types like Dict. This works fine when using seq.

I think the problem is with pickling the target function in lazy_parallelize:

    partitions = split_every(partition_size, iter(result))
    packed_partitions = (pack(func, (partition,)) for partition in partitions)
    for pool_result in pool.imap(unpack, packed_partitions):
        yield pool_result
    pool.terminate()

I executed on my own the function with pool.imap and works fine.

Wouldn't it be better not to use pickling to avoid these kind of problems?

@EntilZha
Copy link
Owner

EntilZha commented Nov 1, 2021

Thanks for the issue report. The reason for dill/cloudpickle is that there are quite a few types that python can't pickle but that those can. I think the solution here is probably making it easy to specify which pickler (including a no-op "pickler"). I'd be open to a PR that implements this and leaves the current defaults.

@stale
Copy link

stale bot commented Dec 17, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 17, 2021
@stale stale bot removed the stale label Dec 23, 2021
@larroy
Copy link
Author

larroy commented Jan 26, 2022

Just getting back to this one, found a very nasty bug / interaction with Spark with Python 3.7 due to pyfunctional loading dill.
https://issues.apache.org/jira/browse/SPARK-36476?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17397126#comment-17397126

I will work on a PR to specify the pickler. Can you expand on how a no-op pickler option would work? I quickly hacked together import pickle as serializer. What kind of tests would you suggest to make sure that other picklers work well?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants