-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Migrate datatreee assertions/extensions/formatting #8967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate datatreee assertions/extensions/formatting #8967
Conversation
| raise TreeIsomorphismError("DataTree objects are not isomorphic:\n" + diff) | ||
|
|
||
|
|
||
| def diff_treestructure(a: DataTree, b: DataTree, require_names_equal: bool) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this into xarray/core/formatting.py to avoid a circular dependency issue.
| `node` | ||
| :any:`NodeMixin` object. | ||
| It is up to the user to assemble these parts to a whole. | ||
| >>> from xarray import Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples in this documentation string are a bit shorter than the originals from anytree. That's because using node.children.values() gets a ValuesView which isn't compatible with iterables like reversed that alter the order of the items in the iterable.
| import textwrap | ||
|
|
||
| from xarray import Dataset | ||
| from xarray.core.dataset import Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing another circular dependency issue. @flamingbear - just FYI, for when you are tweaking ops.py.
| @@ -1,3 +1,4 @@ | |||
| # TODO: Add assert_isomorphic when making DataTree API public | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we didn't want to surface assert_isomorphic until everything else was public. Does that sound good to you @TomNicholas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there was an issue collecting things we need to do to put a final bow on things, but I'm not finding it. Should we add it to #8572? or is that overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a dedicated issue for that either. Yes lets' just make an explicit list under Expose datatree API publicly. on #8572 (I'll do that now)
xarray/testing/assertions.py
Outdated
| elif isinstance(a, Coordinates): | ||
| assert a.equals(b), formatting.diff_coords_repr(a, b, "equals") | ||
| elif isinstance(a, DataTree): | ||
| from_root = kwargs.get("from_root", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would people be more comfortable if I was doing some type checking here for from_root being a bool?
|
|
||
| import xarray as xr | ||
|
|
||
| # TODO: Remove imports in favour of xr.DataTree etc, once part of public API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also hoping that once we can use xr.DataTree that all the type annotations that mypy has insisted on in the tests might be able to be removed (e.g., dt: DataTree = DataTree())
| assert_identical(elders, expected) | ||
|
|
||
|
|
||
| class TestDSMethodInheritance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This felt like a good place for adding these tests. But I could move them to another module if that's a preference (just probably not under the name test_dataset_api.py, which might be confusing outside of a specifically datatree oriented context).
xarray/testing/assertions.py
Outdated
| """ | ||
| Two DataTrees are considered isomorphic if every node has the same number of children. | ||
| Nothing about the data in each node is checked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Nothing about the data in each node is checked. | |
| Nothing about the data or attrs in each node is checked. |
xarray/testing/assertions.py
Outdated
| Arrays with NaN in the same location are considered equal. DataTrees are | ||
| equal if they have isomorphic node structures, with matching node names, | ||
| and if they have matching variables and coordinates, all of which are | ||
| equal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition here of equal/allclose/identical is "isomorphic, and also every set of corresponding nodes in the trees are equal/allclose/identical". It would be good to reword this to indicate that assert_equal for trees is essentially just assert_equal for Dataset but mapped over the trees.
xarray/testing/assertions.py
Outdated
| **kwargs : dict | ||
| Additional keyword arguments passed on to DataTree.equals. Specifically, | ||
| from_root, which defaults True, and will check the whole tree above | ||
| the given node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why assert_equal has **kwargs at all - they don't seem to be used?
assert_allclose accepts specific named tolerance kwargs, shouldn't we just add from_root explicitly to all of these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have misunderstood the conversation in our tag-up today, but it sounded like **kwargs was a potentially preferred option to explicitly stating from_root, where from_root is only used by assert_equal or assert_identical when dealing with a DataTree. But happy to explicitly name from_root as an optional argument if that seems better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgetting about from_root for a moment, I just don't understand why xarray's existing assert_equal method already accepts **kwargs then does nothing with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow here - I added **kwargs to both assert_equal and assert_identical in this PR. Then inside the condition for isinstance(a, DataTree) I try to grab from_root from those kwargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I'm being stupid, ignore the comment about it already being there (it wasn't).
Why are you adding **kwargs though? Did we decide to do that in the meeting? I thought all we decided was that assert_* would need a from_root arg, even if that didn't do anything for non-DataTree inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been suggesting that... the idea was that having it for non-DataTree objects might be confusing, but I guess **kwargs makes tab-completion less useful (but other than that there should not be any difference).
Edit: see the meeting notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad for somehow missing that suggestion initially.
Still, I feel like hiding a kwarg away is more confusing than just showing it and saying it is only used if two DataTree objects are passed... For example apply_ufunc has arguments that are only used if the input is of a certain type.
Another thing we could do here is use typing.overloads to ensure that passing from_root when a and b are not both DataTrees would fail static type checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I was just thinking about using typing.overloads - I'll have a go at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
|
|
||
| @overload | ||
| def assert_equal(a, b): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried to specify all the individual overloads (e.g., assert_equal(a: Dataset, b: Dataset), etc). That led to issues with the return values from DataTree.__getitem__, which are: DataTree | DataArray.
Hopefully this hits enough of the spot, though.
flamingbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the discussion and the resulting code, looks good to me now. Thanks.
xarray/core/formatting.py
Outdated
| f"Left and right {type(a).__name__} objects are not {_compat_to_str(compat)}" | ||
| ] | ||
|
|
||
| # TODO check root parents? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # TODO check root parents? |
This should be covered by the parent assertions having a from_root kwarg.
xarray/tests/test_formatting.py
Outdated
| def test_print_datatree(self, simple_datatree): | ||
| dt = simple_datatree | ||
| print(dt) | ||
|
|
||
| # TODO work out how to test something complex like this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a placeholder for something better. All this does right now is check an error is raised. But we might not need more tests than the ones already in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pulled this comment, but let me know if you actually wanted me to pull any of the tests, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just pull this test, as we are already testing that the __str__ method doesn't error in other tests.
TomNicholas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of very small comments about removing outdated comments / tests, after that I will merge!
This PR continues the overall work of migrating DataTree into xarray.
xarray/core/datatree_render.pyis the renamed version ofxarray/datatree_/datatree/render.py.xarray/core/extensions.pynow contains functionality fromxarray/datatree_/datatree/extensions.py.xarray/core/formatting.pynow contains functionality fromxarray/datatree_/datatree/formatting.py.xarray/tests/test_datatree.pynow contains tests fromxarray/datatree_/datatree/tests/test_dataset_api.py.xarray/testing/assertions.pynow contains functionality from/xarray/datatree_/datatree/testing.py.I had also meant to get to
common.pyand what's left ofio.py, but I've got a hefty couple of days of meetings ahead, so I wanted to get this progress into PR before that happens. @flamingbear or I can follow up with the remaining things in a separate PR. (Also this PR is already getting a little big, so maybe it's already got enough in it)Tests addedUser visible changes (including notable bug fixes) are documented inwhats-new.rstNew functions/methods are listed inapi.rst