Skip to content

Conversation

@aladinor
Copy link
Contributor

@aladinor aladinor commented Oct 23, 2024

Hi all.

This might be more complex than pruning the path in the open_group_as_dict function. It is kind of complex because when we use _iter_zarr_groups, it yields the root group. I am still working o it

@aladinor
Copy link
Contributor Author

aladinor commented Oct 23, 2024

@TomNicholas, I think this solution will work, but the DataTree.from_dict(groups_dict) function will create the root as an empty node. I think we might need to check it out. Let me know your thoughts.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Oct 23, 2024
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason why you don't get the desired result is that you compute paths relative to the immediate parent of the group, not the global parent. I don't have a deeply nested tree ready for testing (so I can't be sure this actually works), but with the suggestion below I don't get the empty root node anymore.

@aladinor
Copy link
Contributor Author

I think we are getting close. However, we are still having some discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path. Example:

print(dtree["/group2/subg1"])
Group: /group2/subg1 <----- different root paths hereDimensions:  (x: 2, y: 3)
│   Inherited coordinates:
│     * x        (x) int64 16B -1 -2* y        (y) int64 24B 0 1 2Data variables:
│       blah     (x) int64 16B 2 3
├── Group: /group2/subg1/subsub1 <----- different paths hereDimensions:  (y: 3)
│       Data variables:
│           var      (y) int64 24B 4 5 6
└── Group: /group2/subg1/subsub2 <----- different  paths here

dt2 = xr.open_datatree("test.zarr",
                          group="/group2/subg1"

print(dt2)
Group: /  <----- different root paths hereDimensions:  (x: 2)
│   Dimensions without coordinates: xData variables:
│       blah     (x) int64 16B ...
├── Group: /subsub1 <----- different  paths hereDimensions:  (y: 3)
│       Dimensions without coordinates: yData variables:
│           var      (y) int64 24B ...
└── Group: /subsub2 <----- different  paths here

Any comments on this @TomNicholas @keewis?

@keewis
Copy link
Collaborator

keewis commented Oct 24, 2024

this is by design, I think? I'd interpret group="/group2/subgroup1" as saying "give me that group as the new root group". Then tree.encoding["source_group"] can contain the full path of the new root.

(if you know unix commands, this would be similar to chroot)

@TomNicholas
Copy link
Member

discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path

The reprs are different because the objects are different: dtree["/group2/subg1"] has a parent, whereas dt2 does not. So this is intended.

@aladinor aladinor marked this pull request as ready for review October 24, 2024 16:47
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need a test, to remove the encoding bit, and then this is good to go!

@TomNicholas
Copy link
Member

The test should look very much like the ones in #9669 - create a tiny nested tree, save it to zarr/netcdf, open it with the group parameter, and check the structure is as expected.

@aladinor aladinor changed the title adding draft for fixing behaviour for group parameter fixing behaviour for group parameter in open_datatree Oct 24, 2024
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @aladinor !

@TomNicholas TomNicholas requested a review from keewis October 24, 2024 18:56
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got two comments: one on our strategy on DataTree whats-new entries, and one on the way we compare node datasets.

@TomNicholas
Copy link
Member

Sorry @aladinor - in fact could we just do this? #9666 (comment)

@TomNicholas
Copy link
Member

This looks good! @keewis 's comments are addressed so I'm going to merge it.

@keewis
Copy link
Collaborator

keewis commented Oct 24, 2024

(you can still add yourself to the list of contributors to the DataTree entry in the whats-new, @aladinor Oh, and me too, apparently 😅)

Edit: see

xarray/doc/whats-new.rst

Lines 24 to 32 in 5b2e6f1

- ``DataTree`` related functionality is now exposed in the main ``xarray`` public
API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, ``xarray.open_groups``,
``xarray.map_over_datasets``, ``xarray.group_subtrees``,
``xarray.register_datatree_accessor`` and ``xarray.testing.assert_isomorphic``.
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_,
`Eni Awowale <https://github.com/eni-awowale>`_,
`Matt Savoie <https://github.com/flamingbear>`_,
`Stephan Hoyer <https://github.com/shoyer>`_ and
`Tom Nicholas <https://github.com/TomNicholas>`_.

@keewis
Copy link
Collaborator

keewis commented Oct 24, 2024

Otherwise @TomNicholas can do that while preparing for the release.

@TomNicholas
Copy link
Member

Amazing thank you! And thanks for pointing that out so everyone involved can get credit for these great contributions!

@TomNicholas TomNicholas enabled auto-merge (squash) October 24, 2024 20:28
@aladinor
Copy link
Contributor Author

Thanks @TomNicholas and @keewis for your guidance!

@TomNicholas TomNicholas disabled auto-merge October 24, 2024 21:00
@TomNicholas TomNicholas merged commit f24cae3 into pydata:main Oct 24, 2024
28 of 29 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 29, 2024
* main:
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 3, 2024
* main: (85 commits)
  Refactor out utility functions from to_zarr (pydata#9695)
  Use the same function to floatize coords in polyfit and polyval (pydata#9691)
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  ...
@aladinor aladinor deleted the fix-group-param branch November 20, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic-backends topic-DataTree Related to the implementation of a DataTree class

Projects

None yet

Development

Successfully merging this pull request may close these issues.

open_datatree(group='some_subgroup') returning parent nodes

3 participants