Skip to content

Commit 88d2c98

Browse files
authored
Benchmarking overhaul and pin Flake8 <6 (#220)
* Better benchmarking infrastructure - see SciTools/iris#4571, SciTools/iris#4583. * Minor improvements to benchmark data generation messages. * Better benchmark imports. * Better strategy for data realisation and ASV. * Introduced on_demand_benchmark decorator - see SciTools/iris#4621. * Simplify benchmark structure following 878b7a3. * Added a benchmarks README mirroring SciTools/iris. * CHANGELOG entry. * Flake8 fixes. * Bump Nox cache. * Cirrus benchmarks pass in CIRRUS_BASE_SHA. * Benchmark README Conda package cache tips. * Reset Nox cache. * New Nox cache. * Remove licence header from asv_delegated_conda.py. * Always re-create Nox benchmark environment (to avoid CI problems). * Pin Flake8 <6. * Always re-create Nox benchmark environment (to avoid CI problems).
1 parent 10cb843 commit 88d2c98

File tree

17 files changed

+599
-364
lines changed

17 files changed

+599
-364
lines changed

.cirrus.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ env:
2424
# Increment the build number to force new conda cache upload.
2525
CONDA_CACHE_BUILD: "1"
2626
# Increment the build number to force new nox cache upload.
27-
NOX_CACHE_BUILD: "1"
27+
NOX_CACHE_BUILD: "3"
2828
# Increment the build number to force new pip cache upload.
2929
PIP_CACHE_BUILD: "0"
3030
# Pip package to be installed.
@@ -153,7 +153,7 @@ benchmark_task:
153153
fi
154154
<< : *LINUX_CONDA_TEMPLATE
155155
asv_cache:
156-
folder: ${CIRRUS_WORKING_DIR}/benchmarks/.asv-env
156+
folder: ${CIRRUS_WORKING_DIR}/benchmarks/.asv/env
157157
reupload_on_changes: true
158158
fingerprint_script:
159159
- echo "${CIRRUS_TASK_NAME}"
@@ -169,4 +169,4 @@ benchmark_task:
169169
- export CONDA_OVERRIDE_LINUX="$(uname -r | cut -d'+' -f1)"
170170
- nox --session=tests --install-only
171171
- export DATA_GEN_PYTHON=$(realpath $(find .nox -path "*tests*bin/python"))
172-
- nox --session="benchmarks(ci compare)"
172+
- nox --no-reuse-existing-virtualenvs --session="benchmarks(branch)" -- "${CIRRUS_BASE_SHA}"

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
1212
- [PR#217](https://github.com/SciTools-incubator/iris-esmf-regrid/pull/217)
1313
Changed the behaviour of coordinate fetching to allow Cubes with both
1414
1D DimCoords and 2D AuxCoords. In this case the DimCoords are prioritised.
15+
- [PR#220](https://github.com/SciTools-incubator/iris-esmf-regrid/pull/220)
16+
Matured the benchmarking architecture in line with the latest setup in
17+
SciTools/iris.
1518

1619
## [0.5] - 2022-10-14
1720

benchmarks/README.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# iris-esmf-regrid Performance Benchmarking
2+
3+
iris-esmf-regrid uses an
4+
[Airspeed Velocity](https://github.com/airspeed-velocity/asv)
5+
(ASV) setup to benchmark performance. This is primarily designed to check for
6+
performance shifts between commits using statistical analysis, but can also
7+
be easily repurposed for manual comparative and scalability analyses.
8+
9+
The benchmarks are run as part of the CI (the `benchmark_task` in
10+
[`.cirrus.yml`](../.cirrus.yml)), with any notable shifts in performance
11+
raising a ❌ failure.
12+
13+
## Running benchmarks
14+
15+
`asv ...` commands must be run from this directory. You will need to have ASV
16+
installed, as well as Nox (see
17+
[Benchmark environments](#benchmark-environments)).
18+
19+
[iris-esmf-regrid's noxfile](../noxfile.py) includes a `benchmarks` session
20+
that provides conveniences for setting up before benchmarking, and can also
21+
replicate the CI run locally. See the session docstring for detail.
22+
23+
### Environment variables
24+
25+
* `DATA_GEN_PYTHON` - required - path to a Python executable that can be
26+
used to generate benchmark test objects/files; see
27+
[Data generation](#data-generation). The Nox session sets this automatically,
28+
but will defer to any value already set in the shell.
29+
* `BENCHMARK_DATA` - optional - path to a directory for benchmark synthetic
30+
test data, which the benchmark scripts will create if it doesn't already
31+
exist. Defaults to `<root>/benchmarks/.data/` if not set. Note that some of
32+
the generated files, especially in the 'SPerf' suite, are many GB in size so
33+
plan accordingly.
34+
* `ON_DEMAND_BENCHMARKS` - optional - when set (to any value): benchmarks
35+
decorated with `@on_demand_benchmark` are included in the ASV run. Usually
36+
coupled with the ASV `--bench` argument to only run the benchmark(s) of
37+
interest. Is set during the Nox `sperf` session.
38+
39+
### Reducing run time
40+
41+
Before benchmarks are run on a commit, the benchmark environment is
42+
automatically aligned with the lock-file for that commit. You can significantly
43+
speed up any environment updates by co-locating the benchmark environment and your
44+
[Conda package cache](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-package-directories-pkgs-dirs)
45+
on the same [file system](https://en.wikipedia.org/wiki/File_system). This can
46+
be done in several ways:
47+
48+
* Move your iris-esmf-regrid checkout, this being the default location for the
49+
benchmark environment.
50+
* Move your package cache by editing
51+
[`pkgs_dirs` in Conda config](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-package-directories-pkgs-dirs).
52+
* Move the benchmark environment by **locally** editing the environment path of
53+
`delegated_env_commands` and `delegated_env_parent` in
54+
[asv.conf.json](asv.conf.json).
55+
56+
## Writing benchmarks
57+
58+
[See the ASV docs](https://asv.readthedocs.io/) for full detail.
59+
60+
### Data generation
61+
**Important:** be sure not to use the benchmarking environment to generate any
62+
test objects/files, as this environment changes with each commit being
63+
benchmarked, creating inconsistent benchmark 'conditions'. The
64+
[generate_data](./benchmarks/generate_data.py) module offers a
65+
solution; read more detail there.
66+
67+
### ASV re-run behaviour
68+
69+
Note that ASV re-runs a benchmark multiple times between its `setup()` routine.
70+
This is a problem for benchmarking certain Iris operations such as data
71+
realisation, since the data will no longer be lazy after the first run.
72+
Consider writing extra steps to restore objects' original state _within_ the
73+
benchmark itself.
74+
75+
If adding steps to the benchmark will skew the result too much then re-running
76+
can be disabled by setting an attribute on the benchmark: `number = 1`. To
77+
maintain result accuracy this should be accompanied by increasing the number of
78+
repeats _between_ `setup()` calls using the `repeat` attribute.
79+
`warmup_time = 0` is also advisable since ASV performs independent re-runs to
80+
estimate run-time, and these will still be subject to the original problem. A
81+
decorator is available for this - `@disable_repeat_between_setup` in
82+
[benchmarks init](./benchmarks/__init__.py).
83+
84+
### Scaling / non-Scaling Performance Differences
85+
86+
When comparing performance between commits/file-type/whatever it can be helpful
87+
to know if the differences exist in scaling or non-scaling parts of the Iris
88+
functionality in question. This can be done using a size parameter, setting
89+
one value to be as small as possible (e.g. a scalar `Cube`), and the other to
90+
be significantly larger (e.g. a 1000x1000 `Cube`). Performance differences
91+
might only be seen for the larger value, or the smaller, or both, getting you
92+
closer to the root cause.
93+
94+
### On-demand benchmarks
95+
96+
Some benchmarks provide useful insight but are inappropriate to be included in
97+
a benchmark run by default, e.g. those with long run-times or requiring a local
98+
file. These benchmarks should be decorated with `@on_demand_benchmark`
99+
(see [benchmarks init](./benchmarks/__init__.py)), which
100+
sets the benchmark to only be included in a run when the `ON_DEMAND_BENCHMARKS`
101+
environment variable is set. Examples include the SPerf benchmark
102+
suite for the UK Met Office NG-VAT project.
103+
104+
## Benchmark environments
105+
106+
We have disabled ASV's standard environment management, instead using an
107+
environment built using the same Nox scripts as Iris' test environments. This
108+
is done using ASV's plugin architecture - see
109+
[asv_delegated_conda.py](asv_delegated_conda.py) and the extra config items in
110+
[asv.conf.json](asv.conf.json).
111+
112+
(ASV is written to control the environment(s) that benchmarks are run in -
113+
minimising external factors and also allowing it to compare between a matrix
114+
of dependencies (each in a separate environment). We have chosen to sacrifice
115+
these features in favour of testing each commit with its intended dependencies,
116+
controlled by Nox + lock-files).

benchmarks/asv.conf.json

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,27 @@
11
{
22
"version": 1,
33
"project": "esmf_regrid",
4-
"repo": "..",
5-
"environment_type": "nox-conda",
6-
"pythons": [],
7-
"branches": ["main"],
8-
"benchmark_dir": "benchmarks",
9-
"env_dir": ".asv-env",
10-
"results_dir": ".asv-results",
11-
"html_dir": ".asv-html",
124
"project_url": "https://github.com/SciTools-incubator/iris-esmf-regrid",
5+
"repo": "..",
6+
"environment_type": "conda-delegated",
137
"show_commit_url": "https://github.com/SciTools-incubator/iris-esmf-regrid/commit/",
14-
"plugins": [".nox_asv_plugin"],
8+
"branches": ["upstream/main"],
9+
10+
"benchmark_dir": "./benchmarks",
11+
"env_dir": ".asv/env",
12+
"results_dir": ".asv/results",
13+
"html_dir": ".asv/html",
14+
"plugins": [".asv_delegated_conda"],
15+
16+
// The command(s) that create/update an environment correctly for the
17+
// checked-out commit.
18+
// Interpreted the same as build_command, with following exceptions:
19+
// * No build-time environment variables.
20+
// * Is run in the same environment as the ASV install itself.
21+
"delegated_env_commands": [
22+
"PY_VER=3.10 nox --envdir={conf_dir}/.asv/env/nox01 --session=tests --install-only --no-error-on-external-run --verbose"
23+
],
24+
// The parent directory of the above environment.
25+
// The most recently modified environment in the directory will be used.
26+
"delegated_env_parent": "{conf_dir}/.asv/env/nox01"
1527
}

benchmarks/asv_delegated_conda.py

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
"""
2+
ASV plug-in providing an alternative :class:`asv.plugins.conda.Conda`
3+
subclass that manages the Conda environment via custom user scripts.
4+
5+
"""
6+
7+
from os import environ
8+
from os.path import getmtime
9+
from pathlib import Path
10+
from shutil import copy2, copytree, rmtree
11+
from tempfile import TemporaryDirectory
12+
13+
from asv import util as asv_util
14+
from asv.config import Config
15+
from asv.console import log
16+
from asv.plugins.conda import Conda
17+
from asv.repo import Repo
18+
19+
20+
class CondaDelegated(Conda):
21+
"""
22+
Manage a Conda environment using custom user scripts, run at each commit.
23+
24+
Ignores user input variations - ``matrix`` / ``pythons`` /
25+
``conda_environment_file``, since environment is being managed outside ASV.
26+
27+
Original environment creation behaviour is inherited, but upon checking out
28+
a commit the custom script(s) are run and the original environment is
29+
replaced with a symlink to the custom environment. This arrangement is then
30+
re-used in subsequent runs.
31+
32+
"""
33+
34+
tool_name = "conda-delegated"
35+
36+
def __init__(
37+
self,
38+
conf: Config,
39+
python: str,
40+
requirements: dict,
41+
tagged_env_vars: dict,
42+
) -> None:
43+
"""
44+
Parameters
45+
----------
46+
conf : Config instance
47+
48+
python : str
49+
Version of Python. Must be of the form "MAJOR.MINOR".
50+
51+
requirements : dict
52+
Dictionary mapping a PyPI package name to a version
53+
identifier string.
54+
55+
tagged_env_vars : dict
56+
Environment variables, tagged for build vs. non-build
57+
58+
"""
59+
ignored = ["`python`"]
60+
if requirements:
61+
ignored.append("`requirements`")
62+
if tagged_env_vars:
63+
ignored.append("`tagged_env_vars`")
64+
if conf.conda_environment_file:
65+
ignored.append("`conda_environment_file`")
66+
message = (
67+
f"Ignoring ASV setting(s): {', '.join(ignored)}. Benchmark "
68+
"environment management is delegated to third party script(s)."
69+
)
70+
log.warning(message)
71+
requirements = {}
72+
tagged_env_vars = {}
73+
conf.conda_environment_file = None
74+
75+
super().__init__(conf, python, requirements, tagged_env_vars)
76+
self._update_info()
77+
78+
self._env_commands = self._interpolate_commands(conf.delegated_env_commands)
79+
# Again using _interpolate_commands to get env parent path - allows use
80+
# of the same ASV env variables.
81+
env_parent_interpolated = self._interpolate_commands(conf.delegated_env_parent)
82+
# Returns list of tuples, we just want the first.
83+
env_parent_first = env_parent_interpolated[0]
84+
# The 'command' is the first item in the returned tuple.
85+
env_parent_string = " ".join(env_parent_first[0])
86+
self._delegated_env_parent = Path(env_parent_string).resolve()
87+
88+
@property
89+
def name(self):
90+
"""Get a name to uniquely identify this environment."""
91+
return asv_util.sanitize_filename(self.tool_name)
92+
93+
def _update_info(self) -> None:
94+
"""Make sure class properties reflect the actual environment being used."""
95+
# Follow symlink if it has been created.
96+
actual_path = Path(self._path).resolve()
97+
self._path = str(actual_path)
98+
99+
# Get custom environment's Python version if it exists yet.
100+
try:
101+
get_version = (
102+
"from sys import version_info; "
103+
"print(f'{version_info.major}.{version_info.minor}')"
104+
)
105+
actual_python = self.run(["-c", get_version])
106+
self._python = actual_python
107+
except OSError:
108+
pass
109+
110+
def _prep_env(self) -> None:
111+
"""Run the custom environment script(s) and switch to using that environment."""
112+
message = f"Running delegated environment management for: {self.name}"
113+
log.info(message)
114+
env_path = Path(self._path)
115+
116+
def copy_asv_files(src_parent: Path, dst_parent: Path) -> None:
117+
"""For copying between self._path and a temporary cache."""
118+
asv_files = list(src_parent.glob("asv*"))
119+
# build_root_path.name usually == "project" .
120+
asv_files += [src_parent / Path(self._build_root).name]
121+
for src_path in asv_files:
122+
dst_path = dst_parent / src_path.name
123+
if not dst_path.exists():
124+
# Only caching in case the environment has been rebuilt.
125+
# If the dst_path already exists: rebuilding hasn't
126+
# happened. Also a non-issue when copying in the reverse
127+
# direction because the cache dir is temporary.
128+
if src_path.is_dir():
129+
func = copytree
130+
else:
131+
func = copy2
132+
func(src_path, dst_path)
133+
134+
with TemporaryDirectory(prefix="delegated_asv_cache_") as asv_cache:
135+
asv_cache_path = Path(asv_cache)
136+
# Cache all of ASV's files as delegated command may remove and
137+
# re-build the environment.
138+
copy_asv_files(env_path.resolve(), asv_cache_path)
139+
140+
# Adapt the build_dir to the cache location.
141+
build_root_path = Path(self._build_root)
142+
build_dir_original = build_root_path / self._repo_subdir
143+
build_dir_subpath = build_dir_original.relative_to(build_root_path.parent)
144+
build_dir = asv_cache_path / build_dir_subpath
145+
146+
# Run the script(s) for delegated environment creation/updating.
147+
# (An adaptation of self._interpolate_and_run_commands).
148+
for command, env, return_codes, cwd in self._env_commands:
149+
local_envs = dict(environ)
150+
local_envs.update(env)
151+
if cwd is None:
152+
cwd = str(build_dir)
153+
_ = asv_util.check_output(
154+
command,
155+
timeout=self._install_timeout,
156+
cwd=cwd,
157+
env=local_envs,
158+
valid_return_codes=return_codes,
159+
)
160+
161+
# Replace the env that ASV created with a symlink to the env
162+
# created/updated by the custom script.
163+
delegated_env_path = sorted(
164+
self._delegated_env_parent.glob("*"),
165+
key=getmtime,
166+
reverse=True,
167+
)[0]
168+
if env_path.resolve() != delegated_env_path:
169+
try:
170+
env_path.unlink(missing_ok=True)
171+
except IsADirectoryError:
172+
rmtree(env_path)
173+
env_path.symlink_to(delegated_env_path, target_is_directory=True)
174+
175+
# Check that environment exists.
176+
try:
177+
env_path.resolve(strict=True)
178+
except FileNotFoundError:
179+
message = f"Path does not resolve to environment: {env_path}"
180+
log.error(message)
181+
raise RuntimeError(message)
182+
183+
# Restore ASV's files from the cache (if necessary).
184+
copy_asv_files(asv_cache_path, env_path.resolve())
185+
186+
# Record new environment information in properties.
187+
self._update_info()
188+
189+
def checkout_project(self, repo: Repo, commit_hash: str) -> None:
190+
"""Check out the working tree of the project at given commit hash."""
191+
super().checkout_project(repo, commit_hash)
192+
self._prep_env()
193+
log.info(f"Environment {self.name} updated to spec at {commit_hash[:8]}")

0 commit comments

Comments
 (0)