Skip to content

Commit a045c81

Browse files
authored
Parallel concatenate (#5926)
* Simplify concatenate * First attempt at parallel concatenate * Clean up a bit * Add support for comparing different data types * Undo unnessary change * More tests * Use faster lookup * Add test to show that NaNs are considered equal * Avoid inserting closures into the Dask graph * Fix type hints * Compute numpy array hashes immediately * Concatenate 25 cubes instead of 2 * Improve test coverage * Add whatsnew entry * Various improvements from review * Use correct value for chunks for numpy arrays * Python 3.10 compatibility * Avoid creating derived coordinates multiple times * Support comparing differently shaped arrays * Rewrite for code style without multiple returns * Remove print call * Better hashing algorithm * Add more information to release notes
1 parent 13017e3 commit a045c81

File tree

6 files changed

+524
-168
lines changed

6 files changed

+524
-168
lines changed

benchmarks/benchmarks/merge_concat.py

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,12 @@
44
# See LICENSE in the root of the repository for full licensing details.
55
"""Benchmarks relating to :meth:`iris.cube.CubeList.merge` and ``concatenate``."""
66

7+
import warnings
8+
79
import numpy as np
810

911
from iris.cube import CubeList
12+
from iris.warnings import IrisVagueMetadataWarning
1013

1114
from .generate_data.stock import realistic_4d_w_everything
1215

@@ -44,19 +47,26 @@ class Concatenate:
4447

4548
cube_list: CubeList
4649

47-
def setup(self):
48-
source_cube = realistic_4d_w_everything()
49-
second_cube = source_cube.copy()
50-
first_dim_coord = second_cube.coord(dimensions=0, dim_coords=True)
51-
first_dim_coord.points = (
52-
first_dim_coord.points + np.ptp(first_dim_coord.points) + 1
53-
)
54-
self.cube_list = CubeList([source_cube, second_cube])
55-
56-
def time_concatenate(self):
50+
params = [[False, True]]
51+
param_names = ["Lazy operations"]
52+
53+
def setup(self, lazy_run: bool):
54+
warnings.filterwarnings("ignore", message="Ignoring a datum")
55+
warnings.filterwarnings("ignore", category=IrisVagueMetadataWarning)
56+
source_cube = realistic_4d_w_everything(lazy=lazy_run)
57+
self.cube_list = CubeList([source_cube])
58+
for _ in range(24):
59+
next_cube = self.cube_list[-1].copy()
60+
first_dim_coord = next_cube.coord(dimensions=0, dim_coords=True)
61+
first_dim_coord.points = (
62+
first_dim_coord.points + np.ptp(first_dim_coord.points) + 1
63+
)
64+
self.cube_list.append(next_cube)
65+
66+
def time_concatenate(self, _):
5767
_ = self.cube_list.concatenate_cube()
5868

59-
def tracemalloc_concatenate(self):
69+
def tracemalloc_concatenate(self, _):
6070
_ = self.cube_list.concatenate_cube()
6171

6272
tracemalloc_concatenate.number = 3 # type: ignore[attr-defined]

docs/src/whatsnew/latest.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,18 @@ This document explains the changes made to Iris for this release
5757
with cftime :class:`~cftime.datetime` objects can benefit from the same
5858
improvement by adding a type hint to their category funcion. (:pull:`5999`)
5959

60+
#. `@bouweandela`_ made :meth:`iris.cube.CubeList.concatenate` faster if more
61+
than two cubes are concatenated with equality checks on the values of
62+
auxiliary coordinates, derived coordinates, cell measures, or ancillary
63+
variables enabled.
64+
In some cases, this may lead to higher memory use. This can be remedied by
65+
reducing the number of Dask workers.
66+
In rare cases, the new implementation could potentially be slower. This
67+
may happen when there are very many or large auxiliary coordinates, derived
68+
coordinates, cell measures, or ancillary variables to be checked that span
69+
the concatenation axis. This issue can be avoided by disabling the
70+
problematic check. (:pull:`5926`)
71+
6072
🔥 Deprecations
6173
===============
6274

0 commit comments

Comments
 (0)