-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GroupBy.shuffle() #9320
Draft
dcherian
wants to merge
40
commits into
pydata:main
Choose a base branch
from
dcherian:groupby-shuffle
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Add GroupBy.shuffle() #9320
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dcherian
commented
Aug 7, 2024
dcherian
force-pushed
the
groupby-shuffle
branch
from
August 7, 2024 02:55
58d01b2
to
1df705e
Compare
dcherian
force-pushed
the
groupby-shuffle
branch
from
August 7, 2024 03:00
1df705e
to
60d7619
Compare
dcherian
commented
Aug 7, 2024
TomNicholas
added
topic-groupby
topic-dask
topic-chunked-arrays
Managing different chunked backends, e.g. dask
labels
Aug 13, 2024
dcherian
commented
Aug 13, 2024
dcherian
commented
Aug 13, 2024
dcherian
added a commit
to xarray-contrib/flox
that referenced
this pull request
Aug 14, 2024
dcherian
added a commit
to xarray-contrib/flox
that referenced
this pull request
Aug 14, 2024
dcherian
added a commit
to xarray-contrib/flox
that referenced
this pull request
Aug 14, 2024
dcherian
added a commit
to xarray-contrib/flox
that referenced
this pull request
Aug 14, 2024
* main: Revise (pydata#9366) Fix rechunking to a frequency with empty bins. (pydata#9364) whats-new entry for dropping python 3.9 (pydata#9359) drop support for `python=3.9` (pydata#8937) Revise (pydata#9357) try to fix scheduled hypothesis test (pydata#9358)
dcherian
changed the title
Add GroupBy.shuffle()
Add GroupBy.shuffle(), DataArray.shuffle_by, Dataset.shuffle_by
Aug 15, 2024
dcherian
commented
Aug 15, 2024
* main: Improve error message for missing coordinate index (pydata#9370) Add flaky to TestNetCDF4ViaDaskData (pydata#9373) Make chunk manager an option in `set_options` (pydata#9362) Revise (pydata#9371) Remove duplicate word from docs (pydata#9367) Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243)
* main: Adds copy parameter to __array__ for numpy 2.0 (pydata#9393) `numpy 2` compatibility in the `pydap` backend (pydata#9391) pyarrow dependency added to doc environment (pydata#9394) Extend padding functionalities (pydata#9353) refactor GroupBy internals (pydata#9389) Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274) passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377) Fix tests on big-endian systems (pydata#9380) Improve error message on `ds['x', 'y']` (pydata#9375)
* main: Accessibility: Add keyboard handling for XArray HTML view (pydata#9412) [pre-commit.ci] pre-commit autoupdate (pydata#9316) [skip-ci] Speed up docs build by limiting toctrees (pydata#9395) fix the failing `pre-commit.ci` runs (pydata#9411) Update benchmarks.yml (pydata#9406) GroupBy(multiple groupers) (pydata#9372) Encode/decode property tests use variables() (pydata#9401)
dcherian
changed the title
Add GroupBy.shuffle(), DataArray.shuffle_by, Dataset.shuffle_by
Add GroupBy.shuffle()
Aug 30, 2024
dcherian
commented
Aug 30, 2024
dcherian
commented
Aug 30, 2024
@aulemahal using |
This reverts commit 7a99c8f.
* main: (29 commits) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) DataTree should not be "Generic" (pydata#9445) Disallow passing a DataArray as data into the DataTree constructor (pydata#9444) Support additional dtypes in `resample` (pydata#9413) Shallow copy parent and children in DataTree constructor (pydata#9297) Bump minimum versions for dependencies (pydata#9434) Always include at least one category in random test data (pydata#9436) Avoid deep-copy when constructing groupby codes (pydata#9429) ...
* main: Opt out of floor division for float dtype time encoding (pydata#9497) fixed formatting for whats-new (pydata#9493) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483)
dcherian
force-pushed
the
groupby-shuffle
branch
from
September 17, 2024 04:02
8787244
to
63b3e77
Compare
* main: Turn off survey banner (pydata#9512) Stateful test: silence DeprecationWarning from drop_dims (pydata#9508)
This was referenced Sep 18, 2024
This was referenced Sep 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds some new API to shuffle an Xarray object. Shuffling means we sort so that members of a group occur in the same chunk, with the possibility of multiple groups in a single chunk.
I've also added
shuffle_by
to DataArray and Dataset. This generalizessortby
, and lets you persist a shuffled Xarray object to disk..shuffle_by
whats-new.rst
api.rst
2024.08.1
chunks
to signature.cc @phofl