-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Children proxy interface #9477
base: main
Are you sure you want to change the base?
Children proxy interface #9477
Conversation
@@ -44,6 +45,68 @@ def __init__(self, *pathsegments): | |||
Tree = TypeVar("Tree", bound="TreeNode") | |||
|
|||
|
|||
class Children(Mapping[str, Tree], Generic[Tree]): | |||
""" | |||
Dictionary-like container for the immediate children of a single DataTree node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this class should allow path-like access to grandchildren or only allow access to immediate children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question also for setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts I think this class should only allow getting/setting immediate children (because it's basically a glorified dict
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts I think this class should only allow getting/setting immediate children (because it's basically a glorified
dict
).
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates to #9485 though - why exactly do we want to support full paths for .coords
but only local access for .children
? We could do whatever, but what's the rationale for them being different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're basically thinking of .coords
/.children
as meaning "access to all coords/children defined anywhere on the subtree". I originally thought of them as "access to just coords/children defined on this node".
FYI path-like access like this means that list(dt.children)
will return a subset of what can be accessed via .__getitem__
, (and __contains__
could go either way, see #9354)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One compromise option of sorts (similar to what is done for Dataset for derived variables like time.year
, or for coordinates in Dataset.__getitem__
) is to only list immediate children in __iter__
and __contains__
but to support accessing them in __getitem__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a detailed discussion about this on the old repo (xarray-contrib/datatree#240 (comment)). There the conclusion was that __contains__
should support paths if __getitem__
does, but it's okay for .keys()
/.values()
/.items()
to be local only.
So you're basically thinking of .coords/.children as meaning "access to all coords/children defined anywhere on the subtree".
To play devil's advocate a bit: If .coords
can access the whole tree then why not also .variables
? I think they should all be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a detailed discussion about this on the old repo (xarray-contrib/datatree#240 (comment)). There the conclusion was that
__contains__
should support paths if__getitem__
does, but it's okay for.keys()
/.values()
/.items()
to be local only.
Yes, that seems to be how Dataset works for coordinates.
To play devil's advocate a bit: If .coords can access the whole tree then why not also .variables? I think they should all be consistent.
Sure, we could do that. Hopefully it's all very straightforwardly reusing the same machinery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems to be how Dataset works for coordinates.
I'm not sure what you mean. You're talking about virtual vs non-virtual coordinates on Dataset
? That they are all in __contains__
and __getitem__
, but only a subset (the non-virtual ones) are listed by .keys()
?
If
.coords
can access the whole tree then why not also.variables
?Sure, we could do that.
I actually meant that as a example of counter-intuitive behaviour 😅 . But maybe it's fine too? That would mean we have really done a 180 degree turn in this issue... It would imply changing the current behaviour of dt.data_vars
too.
Hopefully it's all very straightforwardly reusing the same machinery?
It's not right now (local vs path-supporting have different codepaths), but it could be made to: everything could go through some version of using TreeNode._set_item
xarray/xarray/core/treenode.py
Line 550 in 8bd456c
def _set_item( |
or ._walk_to
and we could add a parameter like traverse_paths
to generalize it for both cases. Generalizing .update
to support paths might be the approach that is easiest to integrate with the Coordinates
base class.
Also one final thing: We are talking here about supporting paths to descendant nodes only? Or upwards via dt['../../path/to/node/']
too?
@@ -224,6 +230,56 @@ def test_overwrite_child(self): | |||
assert marys_evil_twin.parent is john | |||
|
|||
|
|||
class TestChildren: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit awkward to know whether to put tests for this class in test_treenode.py
or test_datatree.py
.
On the one hand the concept of access to .children
is very much part of the TreeNode
structure - it has nothing to do with .data
. On the other hand it's hard to properly test it without full features that are only available on DataTree
, such as .copy
, a nice repr
, and assertions. It's also a fully public-facing class, unlike TreeNode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot less awkward after #9482.
# test constructor | ||
john2 = TreeNode(children=children) | ||
assert john2.children == children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test failure is due to #9478
@@ -44,6 +45,68 @@ def __init__(self, *pathsegments): | |||
Tree = TypeVar("Tree", bound="TreeNode") | |||
|
|||
|
|||
class Children(Mapping[str, Tree], Generic[Tree]): | |||
""" | |||
Dictionary-like container for the immediate children of a single DataTree node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts I think this class should only allow getting/setting immediate children (because it's basically a glorified
dict
).
Agreed!
children = self._treenode._children.copy() | ||
children.update(other) | ||
self._treenode.children = children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to verify, this is a no-op if children have incompatible indexes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .children
setter defined on DataTree
(and TreeNode
) will check for name collisions between variables
and children
, and if it fails it will avoid modification in-place (we can thank the anytree library for that bit of code 1). In the same place then check_alignment
also gets called. Is that what you meant?
Footnotes
-
Should we adjust the licensing part of the readme as we're bundling small portions of anytree code? We did already add it to the licenses directory here. ↩
if not len(other): | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's slightly more canonical Python to avoid the call to len()
:
if not len(other): | |
return | |
if not other: | |
return |
def _names(self) -> list[str]: | ||
return list(self._treenode._children.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not directly use the set-like DictKeys object?
def _names(self) -> list[str]: | |
return list(self._treenode._children.keys()) | |
def _names(self) -> Set[str]: | |
return self._treenode._children.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably use .keys()
you're right, but I don't think I should type this as returning Set
, because I originally tried return set(self._treenode._children)
and that failed intermittently due to ordering differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of collections.abc.Set
, which is an abstract base class that includes DictKeys, not the specific built-in set
class
Just a sketch for now - follows the same general idea of
.coords
.whats-new.rst
api.rst
cc @shoyer