-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Proposer's name
Lars Bärring
Date
2021-05-25
Background
Over the years the prospects for using the CF Conventions to describe various types of derived statistics (aka climate indices or climate indicators) have been recurrently discussed in CF email list threads after the extensive conversation back in 2006-2007 (cf. relevant starting point). Since back then the concept of climate indices/indicators has evolved substantially. The many CF email list threads is a sign of the recurring want to express these new concepts using the CF Conventions. However, the conversation often spread out into discussions of many different aspects with few concrete conclusions with respect to general guidance regarding how to apply the CF Conventions. In this issue I will try to collect some of the ideas and suggestions from several of these email threads.
As a result of the initial conversation in 2006-2007 the following two groups of standard names were introduced:
number_of_days_with_X_above|below_threshold
(canonical unit: 1)spell_length_of days_with_X_above|below_threshold
(canonical unit: day (sic))
While these two groups may seem rather disparate and connected only in that they employ thresholds, they are in some sense connected. This will become more clear in the following discussion regarding generalizations and extensions.
Suggested generalizations/changes and extensions
number_of_days_with_X_above|below_threshold
(deprecation)
- Move the temporal resolution ("days") out of the standard name to make it 'frequency agnostic'. The use case for this change comes from agricultural applications where it is common to count number of hours above a threshold. Suggestion for this change was most recently expressed during a virtual workshop last week (late May 2021).
- Distinguish between strict comparisons (ie. < and >) and non-strict comparisons (i.e. ≤ and ≥), cf. New standard names for non-strict comparison with threshold #31 for details.
These two suggestions point towards standard names following the pattern number_of_occurrences_with_X_strictly_above|below_threshold
or number_of_occurrences_with_X_at_or_above|below_threshold
.
However, from a user perspective there is still a problem with these constructs: the canonical unit is 1
(and not day or hour). While the 1
is semantically consistent with the phrase number of....
users are confused when confronted with this unit in automatically labeled graphs or other output, which was previously touched upon in this email list conversation, and recently resurfaced on an off-line conversation. Hence, the following suggestion:
- Replace the "number_of_" phrase of the standard name by a phrase that results in canonical unit seconds, e.g.
total_duration_
. A "duration" is clearly associated with a time unit, and "total" indicates that several separate events may be joined together. A data variable having such a standard name would normally have unitdays
orhours
etc. according to context and resolution of input data. But during further processing this may (accidentally) change to any other unit of duration (e.g .the canonical unitsecond
). The temporal resolution, i.e. the unit used for discretisation of the duration, must therefore be recorded in the cell_method construct(interval: T)
. This 'discretisation unit' is what basically transforms the counting operation to a summation.
Based on this I would like to suggest five currently existing standard names (v.77) should be deprecated in favour of standard names following the pattern
total_duration_of_X_strictly_above|below_threshold
, canonical unit second
, and
total_duration_of_X_at_or_above|below_threshold
, canonical unit second
.
alternatively
total_duration_of_intervals_with_X_strictly_above|below_threshold
, canonical unit second
, and
total_duration_of_intervals_with_X_at_or_above|below_threshold
, canonical unit second
.
first|last_occurrence_of_X_....
orfirst|last_interval_with_X_....
(new)
Related to summing the duration above/below some threshold, there are a range of use cases or recording the first or last date/time (in the year, season, month, day,...) when the threshold was exceeded. Referring the the original standard namesnumber_of_days_with_X_...
the date/time would typically be recorded as day_of_year or similar, cf. this conversation that as far as I can judge did not arrive at a conclusion or recommendation with respect to the CF Conventions. A related earlier thread focus more the reference time, which is an important aspect for what is discussed here. The climate index/indicator data is calculated per period (year, season or month), where this period is defined in the bounds of the time coordinate of the data variable. Framed this way the date/time of the first/last occurrence is a duration since the time specified by the lower bound of the corresponding time coordinate. As such the canonical units issecond
(in practice it might beday
orhour
). In the context of climate indices/indicators the lower bound of the time coordinate is a natural 'reference time' which should be stated in the explanation of the standard name. As was suggested in the previous point the temporal resolution must be recorded in the cell_method construct(interval: T)
.
Based on this I would like to suggest the following new standard name patterns
first|last_occurrence_of_X_strictly_above|below_threshold
, canonical unit second
, and
first|last_occurrence_of_X_at_or_above|below_threshold
, canonical unit second
.
alternatively
first|last_interval_of_X_strictly_above|below_threshold
, canonical unit second
, and
first|last_interval_of_X_at_or_above|below_threshold
, canonical unit second
.
spell_length_of_days_with_X_above|below_threshold
(deprecation)
A spell is a contiguous period of T above|below threshold (such as wet/dry spell or a heat/cold wave), which in the case of climate indices typically is the longest spell during a period (year, season, month), even though one could of course think of other methods like minimum or mean, where the method is specified in thecell_method
attribute.
- Move the temporal resolution ("days") out of the standard name to make it 'frequency agnostic'. The use case for this change is high resolution precipitation data where it is interesting to analyse shorter spells of high-intensity precipitation rates.
- Change canonical unit to
second
. A spell length is per definition a duration and irrespective of whether the standard name is changed as suggested or not the canonical unit for a duration is seconds.
Similar to the previous two points the temporal resolution must be recorded in the cell_method construct(interval: T)
.
Based on this I would like to suggest that the currently existing four standard names (v.77) following the pattern spell_length_of days_with_X...
should be deprecated in favour of standard names following the pattern
spell_length_of_X_strictly_above|below_threshold
, canonical unit second
, or
spell_length_of_with_X_at_or_above|below_threshold
, canonical unit second
.
beginning|end_of_spell_with_X_....
(new)
Analogous to the second point there are use cases for analysing when during a period the spell begins/ends. The technical details given under point 2 applies here, thus I move directly to suggest these new standard name patterns
beginning|end_of_spell_with_X_strictly_above|below_threshold
, canonical unitsecond
, and *beginning|end_of_spell_with_X_at_or_above|below_threshold
, canonical unitsecond
.
After that we have discussed the standard name patterns suggested here and reached consensus (hopefully we do) I will look into the existing standard names and use cases to suggest specific standard names and explanations/definitions. These explanations will contain technical details regarding cell_methods, how to specify the temporal resolution, and the relationship between unit used for duration and the reference time. In all four points above I suggest to distinguish between strict and non-strict comparisons, as well as include both "above" and "below". However, we should not add specific standard names until there is a concrete use case.
Finally, I should mention that there are two other groups of climate indices/indicators that share some aspects of those presented here. But they are sufficiently different (and more complex) in their technical details to not include them here. Instead they will be covered in separate issues (later), but I mention them here for reference. The first group is in some sense similar to those in point 1, with two important differences: unit
is "fraction_of_year", and the threshold is a spatially varying threshold calculated as a percentile value based on a reference period. The second group is the count of all days belonging to spells of at least a certain duration, where the spell is based on a percentile threshold calculated in the same ways as for the previous group.
Ping (previous conversations) @huard, @aulemahal, @zklaus, @pagecp, @japamment, @martinjuckes, @davidhassell