Reproducing the WeatherBench scores

This is a guide on how to reproduce the scores and figures on the official WeatherBench website.

Running the evaluation script

The main evaluation script is located here: https://github.com/google-research/weatherbench2/blob/main/public_benchmark/run_benchmark_evaluation.py

The script works in combination with a config file that defines the data loader settings for a given model/ground truth, year and resolution: https://github.com/google-research/weatherbench2/blob/main/public_benchmark/public_configs.py

This config file uses data on the public WeatherBench bucket, see Data Guide.

Here is an example of how to run the script locally:

python run_benchmark_evaluation.py \
  --config=public_configs \
  --prediction=hres \
  --target=era5 \
  --resolution=64x32 \
  --year=2020 \
  --time_start=2020-01-01 \
  --time_stop=2020-01-01T12 \
  --lead_time_start=0 \
  --lead_time_stop=12 \
  --lead_time_frequency=6 \
  --output_dir=./results/ \
  --runner=DirectRunner

This will only work for small data. To run a full evaluation, use Dataflow:

export BUCKET=my-bucket
export PROJECT=my-project
export REGION=us-central1

python run_benchmark_evaluation.py \
  --config=public_configs \
  --prediction=hres \
  --target=era5 \
  --resolution=64x32 \
  --year=2020 \
  --time_start=2020-01-01 \
  --time_stop=2020-01-01T12 \
  --lead_time_start=0 \
  --lead_time_stop=12 \
  --lead_time_frequency=6 \
  --output_dir=gs://$BUCKET/tmp/ \
  --runner=DataflowRunner \
  -- \
  --project=$PROJECT \
  --region=$REGION \
  --temp_location=gs://$BUCKET/tmp/ \
  --setup_file=../setup.py \
  --job_name=wbx-evaluation

The precomputed results can be found here: gs://weatherbench2/benchmark_results.

Combining the results

For further use, e.g. to produce the scorecards or interactive graphics, we combine the results into a single file. This is done with this script: https://github.com/google-research/weatherbench2/blob/main/public_benchmark/combine_results.py

Deterministic and probabilistic results are processed separately. The script runs locally and can take a few minutes.

python combine_results.py \
    --input_dir=gs://weatherbench2/benchmark_results \
    --output_dir=./ \
    --mode=deterministic
# or --mode=probabilistic

Plot the scorecards

To plot the scorecards, follow this notebook: https://github.com/google-research/weatherbench2/blob/main/public_benchmark/WB_X_Website_Scorecard.ipynb

Interactive graphics

The code for the interactive graphics (Deterministic and Probabilistic tabs on the website) can be found here: https://github.com/google-research/weatherbench2/blob/main/public_benchmark/apps

See the README for a brief guide how to run the apps.