Metrics¶

Use the right sidebar to navigate.

Base¶

class weatherbenchX.metrics.base.Statistic[source]¶

Abstract base class for statistics.

Statistics are functions of a pair of predictions/targets chunks, which are intended to be aggregated by taking a (potentially weighted) mean over multiple prediction/target pairs, and then used in the computation of a Metric.

A Statistic can be used in two ways: * It can be used directly as a Metric, since it implements the Metric

interface itself by passing through the mean of the statistic’s values.

One or more statistics can be wrapped as a Metric which performs some additional computation (via values_from_mean_statistics) on the mean statistics.

The incoming predictions/targets chunks can either be a dictionary of DataArrays or a Dataset.

For univariate metrics, a PerVariableStatistic should be implemented. Multivariate metrics have access to all variables. The output should also be a Mapping from str to xr.DataArray. In other words, the DataArray has to be named.

Statistics are required to assign their own unique_name, which is used to deduplicate the computation of statistics that are used by multiple metrics. Any additional parameters of the statistic which affect the result of the computation should be captured in self.unique_name.

Statistics should preserve dimensions that are a) required to compute binnings or weights on and b) over which the (weighted) mean is computed. These will typically be the time dimensions (if chunking is done in time) and/or the spatial/observation dimensions (if these are needed for binning or weighting). Other dimensions can be reduced.

class weatherbenchX.metrics.base.PerVariableStatistic[source]¶

Abstract base class for statistics that are computed per variable.

The statistic will be computed independently for each variable that is present in both predictions and targets.

class weatherbenchX.metrics.base.Metric[source]¶

Abstract base class for metrics.

A Metric is defined by specifying:

One or more Statistic`s, which are functions of prediction/target pairs. These are specified by implementing the `statistics property, but are implemented separately to the Metric to allow them to be reused across multiple `Metric`s.
A function to compute the metric’s final value from (weighted) means of the statistics, computed in aggregate over multiple prediction/target pairs. This is specified by implementing values_from_mean_statistics.

As an example, the RMSE metric is defined by specifying the SquaredError statistic, which returns squared errors of prediction/target pairs, and a function which takes the square root of the mean of the SquaredError statistic.

The form of weighted mean(s) used to aggregate the statistics is not determined by the Metric and can be chosen independently, for example to achieve different types of disaggregation and weighting. See aggregation.Aggregator for details.

Metric`s computed for each variable independently should be implemented by subclassing `PerVariableMetric.

class weatherbenchX.metrics.base.PerVariableMetric[source]¶: Abstract base class for metrics that are computed per variable.

class weatherbenchX.metrics.base.PerVariableStatisticWithClimatology(climatology: Dataset)[source]¶

Base class for per-variable statistics with climatology.

This class provides a convenient way to compute statistics that are a function of both the prediction/target and the climatology. The climatology is aligned with the prediction/target based on the prediction’s valid_time.

Subclasses must implement the _compute_per_variable_with_aligned_climatology method, which takes the predictions, targets, and aligned climatology as arguments.

Init.

Parameters:: climatology – The climatology dataset.

Deterministic¶

Statistics¶

class weatherbenchX.metrics.deterministic.Error[source]¶: Error between predictions and targets.

class weatherbenchX.metrics.deterministic.AbsoluteError[source]¶: Absolute error between predictions and targets.

class weatherbenchX.metrics.deterministic.SquaredError[source]¶: Squared error between predictions and targets.

class weatherbenchX.metrics.deterministic.PredictionPassthrough(copy_nans_from_targets: bool = False)[source]¶

Simply returns predictions.

Init.

Parameters:: copy_nans_from_targets – If True, copy any nans from the targets to the predictions.

class weatherbenchX.metrics.deterministic.TargetPassthrough(copy_nans_from_predictions: bool = False)[source]¶

Simply returns targets.

Init.

Parameters:: copy_nans_from_predictions – If True, copy any nans from the predictions to the predictions.

class weatherbenchX.metrics.deterministic.WindVectorSquaredError(u_name: Sequence[str], v_name: Sequence[str], vector_name: Sequence[str])[source]¶

Computes squared error between two wind components.

SE = (u_pred - u_target) ** 2 + (v_pred - v_target) ** 2

Init.

Parameters:

u_name – Name of the u wind component, e.g. [u_component_of_wind].
v_name – Name of the v wind component, e.g. [v_component_of_wind].
vector_name – Name to give output variable, e.g. [wind].

class weatherbenchX.metrics.deterministic.SquaredPredictionAnomaly(climatology: Dataset)[source]¶

Computes (predictions - climatology)**2.

Init.

Parameters:: climatology – The climatology dataset.

class weatherbenchX.metrics.deterministic.SquaredTargetAnomaly(climatology: Dataset)[source]¶

Computes (targets - climatology)**2.

Init.

Parameters:: climatology – The climatology dataset.

class weatherbenchX.metrics.deterministic.AnomalyCovariance(climatology: Dataset)[source]¶

Computes (predictions - climatology) * (targets - climatology).

Init.

Parameters:: climatology – The climatology dataset.

Metrics¶

weatherbenchX.metrics.deterministic.Bias¶: alias of Error

weatherbenchX.metrics.deterministic.MAE¶: alias of AbsoluteError

weatherbenchX.metrics.deterministic.MSE¶: alias of SquaredError

class weatherbenchX.metrics.deterministic.RMSE[source]¶: Root mean squared error.

weatherbenchX.metrics.deterministic.PredictionAverage¶: alias of PredictionPassthrough

weatherbenchX.metrics.deterministic.TargetAverage¶: alias of TargetPassthrough

class weatherbenchX.metrics.deterministic.WindVectorRMSE(u_name: str | list[str], v_name: str | list[str], vector_name: str | list[str])[source]¶

Computes vector RMSE between two wind components.

Init.

Args can be a single string or a list, in which case the statistic will be computed separately for the different elements in the list. For example, u_name=[‘u_component_of_wind’, ‘10m_u_component_of_wind_10m’].

Parameters:

u_name – Name of the u wind component, e.g. u_component_of_wind.
v_name – Name of the v wind component, e.g. v_component_of_wind.
vector_name – Name to give output variable, e.g. wind.

class weatherbenchX.metrics.deterministic.ACC(climatology: Dataset)[source]¶: Anomaly correlation coefficient.

class weatherbenchX.metrics.deterministic.PredictionActivity(climatology: Dataset)[source]¶

Activity in predictions defined as the std dev of the prediction anomalies.

This is used e.g. by ECMWF: https://arxiv.org/abs/2307.10128

Probabilistic¶

Statistics¶

class weatherbenchX.metrics.probabilistic.CRPSSkill(ensemble_dim: str = 'number', skipna_ensemble: bool = False)[source]¶: The skill measure associated with CRPS, E|X - Y|.

class weatherbenchX.metrics.probabilistic.CRPSSpread(ensemble_dim: str = 'number', use_sort: bool = False, fair: bool = True, which: str = 'predictions', skipna_ensemble: bool = False)[source]¶

Sample-based estimate of the spread measure used in CRPS, E|X - X`|.

(This is also referred to in places as Mean Absolute Difference.)

See the docstring for CRPSEnsemble for more details on what ‘fair’ means and the two different options (use_sort=True vs False) for computing the estimate.

class weatherbenchX.metrics.probabilistic.EnsembleVariance(ensemble_dim: str = 'number', skipna_ensemble: bool = False)[source]¶

Computes the mean variance in the ensemble dimension.

This uses the standard unbiased estimator of variance.

class weatherbenchX.metrics.probabilistic.UnbiasedEnsembleMeanSquaredError(ensemble_dim: str = 'number', skipna_ensemble: bool = False)[source]¶

Computes the unbiased ensemble mean squared error.

Let X be the ensemble mean of predictions. If targets.dims contains self.ensemble_dim, then let Y be the ensemble mean of the targets. Otherwise (the usual case), let Y be the targets.

This class estimates E(X - Y)² with no finite-ensemble bias. This is done by subtracting the sample variance divided by ensemble size. As such, you must have ensemble size > 1 or the result will be NaN.

Metrics¶

class weatherbenchX.metrics.probabilistic.CRPSEnsemble(ensemble_dim: str = 'number', use_sort: bool = False, fair: bool = True, skipna_ensemble: bool = False)[source]¶

Continuous ranked probabilisty score for an ensemble prediction.

Given ground truth scalar random variable Y, and two iid predictions X, X`, the Continuously Ranked Probability Score is defined as

CRPS = E|X - Y| - 0.5 * E|X - X`|

where E is mathematical expectation, and | ⋅ | is the absolute value. CRPS has a unique minimum when X is distributed the same as Y.

We implement a ‘fair’ sample-based estimate of CRPS based on 2 or more ensemble members. ‘Fair’ means this is an unbiased estimate of the CRPS attained by the underlying predictive distribution from which the ensemble members are drawn – equivalently, the CRPS attained in the limit of an infinite ensemble.

[Zamo & Naveau, 2018] derive two equivalent ways to compute the spread term in their fair estimator:

1. By averaging absolute differences of all pairs of distinct ensemble members. This is O(M^2) in compute and memory, but easy to parallelize and hence generally cheaper for small-to-medium-sized ensembles. This is CRPS_{Fair} in their paper, and is the default implementation here.

2. By sorting the ensemble members and using their ranks. This is O(M log M) and will be more efficient for sufficiently large ensembles. This is CRPS_{PWM} in their paper. It can be enabled by setting use_sort=True.

References:

[Gneiting & Raftery, 2012], Strictly Proper Scoring Rules, Prediction, and Estimation
[Zamo & Naveau, 2018], Estimation of the Continuous Ranked Probability Score with Limited Information and Applications to Ensemble Weather Forecasts.

Init.

Parameters:

ensemble_dim – Name of the ensemble dimension. Default: ‘number’.
use_sort – If True, use the sorted-rank method for computing the fair estimate of CRPS. This may be more efficient for large ensembles, see class docstring for more details. Default: False.
fair – If True, use the fair estimate of CRPS. If False, use the conventional estimate. Default: True.
skipna_ensemble – If True, any NaN values are treated as missing ensemble members. The metric is computed using an ensemble size corresponding to the number of non-NaN values along the ensemble_dim, which may vary by position along any other dims. When fewer than two ensemble members are present along the ensemble_dim, the metric is computed but will be NaN.

class weatherbenchX.metrics.probabilistic.UnbiasedEnsembleMeanRMSE(ensemble_dim: str = 'number', skipna_ensemble: bool = False)[source]¶

Square root of the unbiased estimate of the ensemble mean MSE.

Init.

Parameters:

ensemble_dim – Name of the ensemble dimension. Default: ‘number’.
skipna_ensemble – If True, NaN values will be ignored along the ensemble dimension. Default: False.

class weatherbenchX.metrics.probabilistic.SpreadSkillRatio(**unused_kwargs)[source]¶

class weatherbenchX.metrics.probabilistic.UnbiasedSpreadSkillRatio(ensemble_dim: str = 'number', skipna_ensemble: bool = False)[source]¶

Computes a spread-skill ratio based on the unbiased skill estimator.

Specifically this is the ratio:

sqrt(unbiased estimate of the mean variance of the predictive distribution) / sqrt(unbiased estimate of the MSE of the predictive distribution’s mean)

This has the convenient property that the numerator and denominator are equal in expectation for perfect ensemble forecasts, meaning the ratio should be close to 1 in this perfect case. (Because of the non-linear division operation and square root, the resulting ratio itself is not exactly 1 in expectation in the perfect case, but this bias goes away as we average over more forecasts.)

Another way to achieve this property is to apply a correction factor to a spread-skill ratio computed using a more standard skill estimator (the MSE of the ensemble mean, which is biased). This corrected spread-skill relationship is described in [1]. Here we’ve decided to standardize on using the unbiased MSE estimator instead, because it doesn’t require a correction factor, reuses our existing implementation of the unbiased MSE estimator, and makes it easier to support the case of variable ensemble sizes (via skipna_ensemble) correctly.

[1] Fortin, V. et al. Why Should Ensemble Spread Match the RMSE of the Ensemble Mean? J. Hydrometeorol. 15, 1708-1713 (2014).

Init.

Parameters:

ensemble_dim – Name of the ensemble dimension. Default: ‘number’.
skipna_ensemble – If True, NaN values will be ignored along the ensemble dimension. Default: False.

Categorical¶

Statistics¶

class weatherbenchX.metrics.categorical.TruePositives[source]¶: True positives from binary predictions and targets.

class weatherbenchX.metrics.categorical.TrueNegatives[source]¶: True negatives from binary predictions and targets.

class weatherbenchX.metrics.categorical.FalsePositives[source]¶: False positives from binary predictions and targets.

class weatherbenchX.metrics.categorical.FalseNegatives[source]¶: False negatives from binary predictions and targets.

Metrics¶

class weatherbenchX.metrics.categorical.CSI[source]¶

Critical Success Index.

Also called Threat Score (TS).

CSI = (TP / (TP + FP + FN)).

class weatherbenchX.metrics.categorical.Accuracy[source]¶

Accuracy.

ACC = (TP + TN) / (TP + FP + FN + TN).

class weatherbenchX.metrics.categorical.Recall[source]¶

Also called True Positive Rate (TPR) or Sensitivity.

Recall = TP / (TP + FN).

class weatherbenchX.metrics.categorical.Precision[source]¶

Also called Positive Predictive Value (PPV).

Precision = TP / (TP + FP).

class weatherbenchX.metrics.categorical.F1Score[source]¶

F1 score.

F1 = 2 * Precision * Recall / (Precision + Recall): = 2 * TP / (2 * TP + FP + FN).

class weatherbenchX.metrics.categorical.FrequencyBias[source]¶

Frequency bias.

FB = PP / P = (TP + FP) / (TP + FN)

class weatherbenchX.metrics.categorical.SEEPS(variables: Sequence[str], climatology: Dataset, dry_threshold_mm: float | Sequence[float] = 0.25, min_p1: float | Sequence[float] = 0.1, max_p1: float | Sequence[float] = 0.85)[source]¶

Computes Stable Equitable Error in Probability Space.

Definition in Rodwell et al. (2010): https://www.ecmwf.int/en/elibrary/76205-new-equitable-score-suitable-verifying-precipitation-nwp

Important: In most cases, the statistic will contain NaNs because of the masking of high and low p1 values. For this reason, a mask coordinate will be added to the resulting statistic to be used in combination with masked=True in the aggregator. If a mask already exists in either the predictions or targets, it will be combined with the p1 mask.

Init.

Parameters:

variables – List of precipitation variables to compute SEEPS for.
climatology – Climatology containing *_seeps_dry_fraction and *_seeps_threshold for each of the precipitation variables with dimensions dayofyear and hour, as well as latitude and longitude corresponding to the predictions/targets coordinates, see example below.
dry_threshold_mm – Values smaller or equal are considered dry. Unit: mm. Can be list for each variable. Must be same length. Default: 0.25
min_p1 – Mask out p1 values below this threshold. Can be list for each variable. Default: 0.1
max_p1 – Mask out p1 values above this threshold. Can be list for each variable. Default: 0.85

Example

>>> climatology
<xarray.Dataset> Size: 24MB
Dimensions:                                     (hour: 4, dayofyear: 366,
                                                longitude: 64, latitude: 32)
Coordinates:
  * dayofyear                                   (dayofyear) int64 3kB 1 ... 366
  * hour                                        (hour) int64 32B 0 6 12 18
  * latitude                                    (latitude) float64 256B -87.1...
  * longitude                                   (longitude) float64 512B 0.0 ...
Data variables:
    total_precipitation_6hr_seeps_dry_fraction  (hour, dayofyear, longitude, latitude) ...
    total_precipitation_6hr_seeps_threshold     (hour, dayofyear, longitude, latitude) ...

Spatial¶

Statistics¶

class weatherbenchX.metrics.spatial.SquaredFractionsError(neighborhood_size_in_pixels: int | Iterable[int], wrap_longitude: bool = False, combine_mask: bool = False)[source]¶: Numerator of the FSS.

class weatherbenchX.metrics.spatial.SquaredPredictionFraction(neighborhood_size_in_pixels: int | Iterable[int], wrap_longitude: bool = False, combine_mask: bool = False)[source]¶: One part of the denominator of the FSS.

class weatherbenchX.metrics.spatial.SquaredTargetFraction(neighborhood_size_in_pixels: int | Iterable[int], wrap_longitude: bool = False, combine_mask: bool = False)[source]¶: One part of the denominator of the FSS.

Metrics¶

class weatherbenchX.metrics.spatial.FSS(neighborhood_size_in_pixels: int | Iterable[int], wrap_longitude: bool = False, combine_mask: bool = False)[source]¶

Implementation of the Fractions Skill Score (FSS).

Assumes the input data is already binary. The FSS is defined by a square neighborhood size in pixels. On a lat-lon grid this can lead to distorted neighborhoods towards the poles.

Original paper: Roberts and Lean, 2008. https://doi.org/10.1175/2007MWR2123.1

More recent overvew paper, including discussion of how to compute the FSS over multiple forecasts: https://journals.ametsoc.org/view/journals/mwre/149/10/MWR-D-18-0106.1.xml

Note that if there is no rain in the aggregated targets and predictions, the FSS is undfined (NaN).

neighborhood_size_in_pixels¶

The size of the neighborhood to use for averaging in pixels. Must be odd. Can be an integer or a list, in which case the result will have an additional dimension ‘neighborhood_size’.

Type:: int | Iterable[int]

wrap_longitude¶

If True, averaging operation wraps around longitude. Default: False.

Type:: bool

Wrappers¶

class weatherbenchX.metrics.wrappers.InputTransform(which)[source]¶

Base class for input transformations.

Init.

Parameters:: which – Which input to apply the wrapper to. Must be one of ‘predictions’, ‘targets’, or ‘both’.

class weatherbenchX.metrics.wrappers.EnsembleMean(which: str, ensemble_dim='number', skipna=False, skip_if_ensemble_dim_missing: bool = False)[source]¶

Compute ensemble mean.

Init.

Parameters:

which – Which input to apply the wrapper to. Must be one of ‘predictions’, ‘targets’, or ‘both’.
ensemble_dim – Name of ensemble dimension. Default: ‘number’.
skipna – If True, skip NaNs in the ensemble mean. Default: False.
skip_if_ensemble_dim_missing – If True, skip the ensemble mean if the ensemble dimension is missing. Default: False.

class weatherbenchX.metrics.wrappers.ContinuousToBinary(which: str, threshold_value: float | Iterable[float] | DataArray | Dataset, threshold_dim: str, unique_name_suffix: str | None = None)[source]¶

Converts a continuous input to a binary one.

Applies x > threshold for all threholds and concatenates along a new dimension of name threshold_dim.

Init.

Parameters:

which – Which input to apply the wrapper to. Must be one of ‘predictions’, ‘targets’, or ‘both’.
threshold_value – Threshold value, iterable of values, xarray.DataArray or xarray.Dataset.
threshold_dim – Name of dimension to use for threshold values.
unique_name_suffix – Suffix to add to the unique name. If threshold_values is an xarray.DataArray or xarray.Dataset, this must be provided, and must be unique over all the threshold_value that you intend to use within a set of Metrics that are computed together.

class weatherbenchX.metrics.wrappers.WrappedStatistic(statistic: Statistic, transform: InputTransform)[source]¶

Wraps a statistic with an input transform.

Also adds suffix to unique name. TODO(srasp): Add option for multiple transforms.

Init.

Parameters:

statistic – Statistic object to wrap.
transform – Transform to apply to inputs.

class weatherbenchX.metrics.wrappers.WrappedMetric(metric: Metric, transforms: list[InputTransform], unique_name_suffix: str | None = None)[source]¶

Wraps all statistics of a metric with input transforms.

Init.

Parameters:

metric – Metric to wrap.
transforms – List of input transforms to apply. The transforms will be applied in the order they are added to the list. I.e. transforms [f, g, h], transform x as h(g(f(x))).
unique_name_suffix – Optional suffix to use for uniquely naming all associated statistics. By default, this is constructed automatically from the transforms, which may be overly verbose.