Handle forecast latency¶
Some predictions have a certain latency. For example, HRES forecasts are typically only available just under 6h after the nominal init_time, so the 6UTC run will be released just before 12UTC. This means that lead_time <6h on the prediction files are actually never available in real time, in other words the corresponding valid times would always be in the past. (Valid time is the time for which the forecast is made, so a forecast initialized at 00UTC with a lead time of 6h would have a valid time at 06UTC.)
To adjust the evaluation to the “operational” setting where only actually available forecasts are evaluated, we can use latency wrappers.
These latency wrappers, look up the most recently available forecast for a given “query” init_time and adjust the lead_time on file to the “query” lead time.
Example: For a query init_time of 21UTC and a query lead_time of 3h (with a valid_time of 00UTC), the most recently available HRES forecast would be the 12UTC run with a lead_time of 12h.
# IMPORTANT: If you are running this on Colab, uncomment the cell below to access the cloud datasets.
# from google.colab import auth
# auth.authenticate_user()
import numpy as np
from weatherbenchX.data_loaders import xarray_loaders
from weatherbenchX.data_loaders import latency_wrappers
prediction_path = 'gs://weatherbench2/datasets/hres/2016-2022-0012-64x32_equiangular_conservative.zarr'
variables = ['2m_temperature', 'geopotential']
prediction_data_loader = xarray_loaders.PredictionsFromXarray(
path=prediction_path,
variables=variables,
)
init_times = np.array(['2020-01-01T21'], dtype='datetime64[ns]')
lead_times = np.array([3], dtype='timedelta64[h]').astype('timedelta64[ns]')
prediction_data_loader = latency_wrappers.XarrayConstantLatencyWrapper(
prediction_data_loader, latency=np.timedelta64(6, 'h')
)
prediction_chunk = prediction_data_loader.load_chunk(init_times, lead_times)
2020-01-01T12:00:00.000000000 [12]
prediction_chunk
<xarray.Dataset> Size: 116kB
Dimensions: (latitude: 32, longitude: 64, init_time: 1, lead_time: 1,
level: 13)
Coordinates:
* latitude (latitude) float64 256B -87.19 -81.56 -75.94 ... 81.56 87.19
* longitude (longitude) float64 512B 0.0 5.625 11.25 ... 348.8 354.4
* init_time (init_time) datetime64[ns] 8B 2020-01-01T21:00:00
* lead_time (lead_time) timedelta64[ns] 8B 03:00:00
* level (level) int32 52B 50 100 150 200 250 ... 700 850 925 1000
Data variables:
2m_temperature (init_time, lead_time, longitude, latitude) float32 8kB 2...
geopotential (init_time, lead_time, level, longitude, latitude) float32 106kB ...
Attributes:
long_name: 2 metre temperature
short_name: t2m
standard_name: unknown
units: KAs you can see from the printed time stamps above, the forecast read from file was the 12UTC + 12h forecast but the returned init/lead_times are the “query times”.
Note that for Zarr files, the available nominal init_times are directly read from the Xarray coordinate. For other, non-Xarray data loaders, use ConstantLatencyWrapper and explicitly specify the available nominal init_times.
Sometimes there are cases, where different forecasts are split across datasets, e.g. 00/12UTC and 06/18UTC files. MultipleConstantLatencyWrapper allows combining several latency wrappers and will pick the most recently available forecast across all files.