Skip to content

weigths are always realized by the iris.analysis module #5338

@bouweandela

Description

@bouweandela

🐛 Bug Report

How To Reproduce

Steps to reproduce the behaviour:

Recent versions of iris realize the weights arrays. It looks like the issue was introduced in #5084, so iris versions since 3.5 are affected.

Example:

Use cube.collapsed(aggregator=iris.analysis.MEAN, weights=weights, coords=['latitude', 'longitude']) where cube is an iris.cube.Cube and weights is a dask.array.Array. This will issue a warning like

/home/bandela/mambaforge/envs/test-iris-3.6/lib/python3.11/site-packages/distributed/client.py:3109: UserWarning: Sending large graph of size 386.72 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(

because the weights are realized in this code:

class _Weights(np.ndarray):
"""Class for handling weights for weighted aggregation.
This subclasses :class:`numpy.ndarray`; thus, all methods and properties of
:class:`numpy.ndarray` (e.g., `shape`, `ndim`, `view()`, etc.) are
available.
Details on subclassing :class:`numpy.ndarray` are given here:
https://numpy.org/doc/stable/user/basics.subclassing.html
"""
def __new__(cls, weights, cube, units=None):
"""Create class instance.
Args:
* weights (Cube, string, _DimensionalMetadata, array-like):
If given as a :class:`iris.cube.Cube`, use its data and units. If
given as a :obj:`str` or :class:`iris.coords._DimensionalMetadata`,
assume this is (the name of) a
:class:`iris.coords._DimensionalMetadata` object of the cube (i.e.,
one of :meth:`iris.cube.Cube.coords`,
:meth:`iris.cube.Cube.cell_measures`, or
:meth:`iris.cube.Cube.ancillary_variables`). If given as an
array-like object, use this directly and assume units of `1`. If
`units` is given, ignore all units derived above and use the ones
given by `units`.
* cube (Cube):
Input cube for aggregation. If weights is given as :obj:`str` or
:class:`iris.coords._DimensionalMetadata`, try to extract the
:class:`iris.coords._DimensionalMetadata` object and corresponding
dimensional mappings from this cube. Otherwise, this argument is
ignored.
* units (string, Unit):
If ``None``, use units derived from `weights`. Otherwise, overwrite
the units derived from `weights` and use `units`.
"""
# `weights` is a cube
# Note: to avoid circular imports of Cube we use duck typing using the
# "hasattr" syntax here
# --> Extract data and units from cube
if hasattr(weights, "add_aux_coord"):
obj = np.asarray(weights.data).view(cls)
obj.units = weights.units
# `weights`` is a string or _DimensionalMetadata object
# --> Extract _DimensionalMetadata object from cube, broadcast it to
# correct shape using the corresponding dimensional mapping, and use
# its data and units
elif isinstance(weights, (str, _DimensionalMetadata)):
dim_metadata = cube._dimensional_metadata(weights)
arr = dim_metadata._values
if dim_metadata.shape != cube.shape:
arr = iris.util.broadcast_to_shape(
arr,
cube.shape,
dim_metadata.cube_dims(cube),
)
obj = np.asarray(arr).view(cls)
obj.units = dim_metadata.units
# Remaining types (e.g., np.ndarray): try to convert to ndarray.
else:
obj = np.asarray(weights).view(cls)
obj.units = Unit("1")
# Overwrite units from units argument if necessary
if units is not None:
obj.units = units
return obj
def __array_finalize__(self, obj):
"""See https://numpy.org/doc/stable/user/basics.subclassing.html.
Note
----
`obj` cannot be `None` here since ``_Weights.__new__`` does not call
``super().__new__`` explicitly.
"""
self.units = getattr(obj, "units", Unit("1"))
@classmethod
def update_kwargs(cls, kwargs, cube):
"""Update ``weights`` keyword argument in-place.
Args:
* kwargs (dict):
Keyword arguments that will be updated in-place if a `weights`
keyword is present which is not ``None``.
* cube (Cube):
Input cube for aggregation. If weights is given as :obj:`str`, try
to extract a cell measure with the corresponding name from this
cube. Otherwise, this argument is ignored.
"""
if kwargs.get("weights") is not None:
kwargs["weights"] = cls(kwargs["weights"], cube)

Expected behaviour

The laziness of the weights array should be preserved. Because the weights array must be the same size as the data (on a side note: why does it need to be the same size? is this a limitation of numpy?), this makes it impossible to use this feature on large datasets.

Environment

  • OS & Version: Ubuntu 23.04
  • Iris Version: 3.5, 3.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions