-
Notifications
You must be signed in to change notification settings - Fork 298
Description
🐛 Bug Report
How To Reproduce
Steps to reproduce the behaviour:
Recent versions of iris realize the weights arrays. It looks like the issue was introduced in #5084, so iris versions since 3.5 are affected.
Example:
Use cube.collapsed(aggregator=iris.analysis.MEAN, weights=weights, coords=['latitude', 'longitude']) where cube is an iris.cube.Cube and weights is a dask.array.Array. This will issue a warning like
/home/bandela/mambaforge/envs/test-iris-3.6/lib/python3.11/site-packages/distributed/client.py:3109: UserWarning: Sending large graph of size 386.72 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
warnings.warn(
because the weights are realized in this code:
iris/lib/iris/analysis/__init__.py
Lines 1190 to 1291 in 1399994
| class _Weights(np.ndarray): | |
| """Class for handling weights for weighted aggregation. | |
| This subclasses :class:`numpy.ndarray`; thus, all methods and properties of | |
| :class:`numpy.ndarray` (e.g., `shape`, `ndim`, `view()`, etc.) are | |
| available. | |
| Details on subclassing :class:`numpy.ndarray` are given here: | |
| https://numpy.org/doc/stable/user/basics.subclassing.html | |
| """ | |
| def __new__(cls, weights, cube, units=None): | |
| """Create class instance. | |
| Args: | |
| * weights (Cube, string, _DimensionalMetadata, array-like): | |
| If given as a :class:`iris.cube.Cube`, use its data and units. If | |
| given as a :obj:`str` or :class:`iris.coords._DimensionalMetadata`, | |
| assume this is (the name of) a | |
| :class:`iris.coords._DimensionalMetadata` object of the cube (i.e., | |
| one of :meth:`iris.cube.Cube.coords`, | |
| :meth:`iris.cube.Cube.cell_measures`, or | |
| :meth:`iris.cube.Cube.ancillary_variables`). If given as an | |
| array-like object, use this directly and assume units of `1`. If | |
| `units` is given, ignore all units derived above and use the ones | |
| given by `units`. | |
| * cube (Cube): | |
| Input cube for aggregation. If weights is given as :obj:`str` or | |
| :class:`iris.coords._DimensionalMetadata`, try to extract the | |
| :class:`iris.coords._DimensionalMetadata` object and corresponding | |
| dimensional mappings from this cube. Otherwise, this argument is | |
| ignored. | |
| * units (string, Unit): | |
| If ``None``, use units derived from `weights`. Otherwise, overwrite | |
| the units derived from `weights` and use `units`. | |
| """ | |
| # `weights` is a cube | |
| # Note: to avoid circular imports of Cube we use duck typing using the | |
| # "hasattr" syntax here | |
| # --> Extract data and units from cube | |
| if hasattr(weights, "add_aux_coord"): | |
| obj = np.asarray(weights.data).view(cls) | |
| obj.units = weights.units | |
| # `weights`` is a string or _DimensionalMetadata object | |
| # --> Extract _DimensionalMetadata object from cube, broadcast it to | |
| # correct shape using the corresponding dimensional mapping, and use | |
| # its data and units | |
| elif isinstance(weights, (str, _DimensionalMetadata)): | |
| dim_metadata = cube._dimensional_metadata(weights) | |
| arr = dim_metadata._values | |
| if dim_metadata.shape != cube.shape: | |
| arr = iris.util.broadcast_to_shape( | |
| arr, | |
| cube.shape, | |
| dim_metadata.cube_dims(cube), | |
| ) | |
| obj = np.asarray(arr).view(cls) | |
| obj.units = dim_metadata.units | |
| # Remaining types (e.g., np.ndarray): try to convert to ndarray. | |
| else: | |
| obj = np.asarray(weights).view(cls) | |
| obj.units = Unit("1") | |
| # Overwrite units from units argument if necessary | |
| if units is not None: | |
| obj.units = units | |
| return obj | |
| def __array_finalize__(self, obj): | |
| """See https://numpy.org/doc/stable/user/basics.subclassing.html. | |
| Note | |
| ---- | |
| `obj` cannot be `None` here since ``_Weights.__new__`` does not call | |
| ``super().__new__`` explicitly. | |
| """ | |
| self.units = getattr(obj, "units", Unit("1")) | |
| @classmethod | |
| def update_kwargs(cls, kwargs, cube): | |
| """Update ``weights`` keyword argument in-place. | |
| Args: | |
| * kwargs (dict): | |
| Keyword arguments that will be updated in-place if a `weights` | |
| keyword is present which is not ``None``. | |
| * cube (Cube): | |
| Input cube for aggregation. If weights is given as :obj:`str`, try | |
| to extract a cell measure with the corresponding name from this | |
| cube. Otherwise, this argument is ignored. | |
| """ | |
| if kwargs.get("weights") is not None: | |
| kwargs["weights"] = cls(kwargs["weights"], cube) |
Expected behaviour
The laziness of the weights array should be preserved. Because the weights array must be the same size as the data (on a side note: why does it need to be the same size? is this a limitation of numpy?), this makes it impossible to use this feature on large datasets.
Environment
- OS & Version: Ubuntu 23.04
- Iris Version: 3.5, 3.6