-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Multiindex scalar coords, fixes #1408 #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if a single element is selected from a MultiIndex.
| if set(v.dims) <= allowed_dims) | ||
| return self._replace(variable, coords, name) | ||
|
|
||
| def _replace_indexes(self, indexes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this method and use Dataset._replace_indexes instead, to reduce duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember well, we used duplicates to avoid using _to_temp_dataset and _from_temp_dataset in __getitem__. But now that _replace_indexes has more logic implemented, maybe it is a good idea to reduce duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Yes, in DataArray.__getitem__ and also in DataAarray.isel, _to_temp_dataset and _from_temp_dataset are now being used.
| del obj.coords[dim] | ||
| return obj | ||
|
|
||
| def _maybe_stack(self, applied): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method becomes necessary, because now we cannot do xr.concat([ds.isel(yx=i)] for i in range(*)], dim='yx') because ds.isel(yx=i) does not have yx anymore.
I don't like this change. It breaks an important invariant, which is that indexing a Variable returns another Variable. I do agree with indexing along a MultiIndex dimension should unpacking the tuple for coordinates, but only for coordinates. So this needs to be somewhere in the Consider indexing We want to change the indexing behavior to this: But we don't want to change what happens to the DataArray itself -- it should still be a scalar object array. I tested this example on your PR branch, and it actually crashes with |
…ement is selected from MultiIndex. Instead, added _maybe_split function in Dataset
|
@shoyer
I totally agree with you. In the last commit, I moved the unpacking functionality into Adding functions is easy but simplifying them are difficult... If anyone show a direction, I will try the improvement. |
|
A possible direction to reduce the This would simplify many things, although I haven't thought about about all other possible issues it would create (perfomance, etc.). Also, Here is another case related to this PR. From the example in the linked issue, the current behavior is In [9]: ds.isel(yx=[0, 1])
Out[9]:
<xarray.Dataset>
Dimensions: (yx: 2)
Coordinates:
* yx (yx) MultiIndex
- y (yx) object 'a' 'a'
- x (yx) int64 1 2
Data variables:
foo (yx) int64 1 2Do we want to also change the behavior to this? In [10]: ds.isel(yx=[0, 1])
Out[10]:
<xarray.Dataset>
Dimensions: (x: 2)
Coordinates:
* x (x) int64 1 2
y object 'a'
Data variables:
foo (x) int64 1 2To me it looks like it is a bit too magical, but just wondering what you think... |
Agreed, this also seems too magical to me. |
|
I also agree. It seems too magical. But I slightly changed my mind. The current structure is illustrated as follows, The I am wondering if we could have the following class structure things become simpler In this picture, MultiIndex can have I understand it is different from Any thoughts are welcome. |
|
@fujiisoup Yes, the solution of writing a MultiIndex wrapper for xarray looks much cleaner to me. I like the look of this proposal! (Those diagrams are also very helpful) I guess this could be implemented as a |
|
I also agree that a MultiIndex wrapper would be to be the way to go. I think I understand the diagrams, but what remains a bit unclear to me is how this could be implemented. In particular, how would this wrapper work with Currently, Would This is maybe slightly off topic, but more generally I'm also wondering how this would co-exist with potentially other kinds of multi-level indexes (see this comment). |
|
@benbovy Thanks. |
|
Replaced by a new PR #1426 . |


git diff upstream/master | flake8 --diffwhats-new.rstfor all changes andapi.rstfor new APITo fix #1408,
This modification works, but actually I do not fully satisfied yet.
There are
ifstatements in many places.The major changes I made are
variable.__getitem__now returns an OrderedDict if a single element is selected from MultiIndex.indexing.remap_level_indexersalso returnsselected_dimswhich is a map from the original dimension to the selected dims which will be a scalar coordinate.Change 1 keeps level-coordinates even after
ds.isel(yx=0).Change 2 enables to track which levels are selected, then the selected levels are changed to a scalar coordinate.
I guess much smarter solution should exist.
I would be happy if anyone gives me a comment.