-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure map_partitions
returns Series
object if function returns scalar
#10739
base: main
Are you sure you want to change the base?
Conversation
map_partitions
returns Series
object if function returns scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Looks like there are a bunch of genuine typing issues since I set the dtype on the created series explicitly. @phofl if you want to, you can pick this up after the holidays but if not I'll handle it once I'm back |
I'll take a look after the holidays, the dtype issue is most likely because of running this on an empty meta object, that's the situation where I've seen us ending up with NaN there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phofl in case you want to have another look
@@ -7282,6 +7287,10 @@ def apply_and_enforce(*args, **kwargs): | |||
func = kwargs.pop("_func") | |||
meta = kwargs.pop("_meta") | |||
df = func(*args, **kwargs) | |||
|
|||
if is_scalar(df) and is_series_like(meta): | |||
df = type(meta)([df]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up removing the explicit dtype
since it caused issues and was quite unnecessary considering that df
is already the final result data. The casting/coercing logic of the Series
takes care of compat stuff with nulls, etc.
token=name1, | ||
meta=self, | ||
**chunk_kwargs, | ||
enforce_metadata=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using map_partitions
internally we sometimes rely on the data not being cast to a Series. The cumulative aggregation logic becomes quite cumbersome otherwise. I hope this is not introducing any other weird artefacts 🤞
b21d4f2
to
71c29e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic looks fine, I think the test failures are something that we can fix by adjusting the test
7d4fbd7
to
fc5cd91
Compare
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 3h 35m 57s ⏱️ + 13m 1s For more details on these failures and errors, see this check. Results for commit c60269b. ± Comparison against base commit 1ecf464. ♻️ This comment has been updated with latest results. |
fc5cd91
to
a7f08aa
Compare
2e95b60
to
c60269b
Compare
pre-commit run --all-files