GitHub - JackKelly/explore_nwps: Explore numerical weather prediction datasets on cloud object storage

explore_nwps

Simple Python scripts for exploring the file structure of numerical weather predictions on cloud object storage.

The main use-case is to help generate machine-readable descriptions of NWP datasets for hypergrib.

Only consider the most recent version of GEFS (starting at 2020-09-23T12). See hypergrib::datasets::gefs::version for notes about the structure of the object keys.
Only consider a small subset of coord labels where there are no missing coord combinations.
Write the YAML primarily for consumption into Rust (later, we may also want to load into Python. But it'll be easier to modify the YAML in Python).
Then write a minimal Rust hypergrib that loads the mostly hand-written YAML and pumps glorious GRIB data into xarray!

The ultimate goal is to output yaml which roughly conforms to the design sketch here.

That breaks down into these sub-tasks:

Ensemble members and vertical levels (see issues #3 and #4).
Get a list of parameters & vertical levels by reading the contents of a sample of .idx files. See issue #2. Start simple. We don't need an exhaustive list of parameters for the MVP.
Get a list of horizontal spatial coordinates by reading a sample of GRIB files. See issue #1.

Beyond the MVP:

Record if/when the number of ensemble members and/or steps changes.
Decode the parameter abbreviation string and the string summarising the vertical level using the grib_tables sub-crate (so the user gets more information about what these mean, and so the levels can be put into order).
Record the dimension names, array shape, and coordinate labels in a JSON file. Record the decoded GRIB parameter names and GRIB vertical levels so the end-user doesn't need to use grib_tables (maybe have a mapping from each abbreviation string used in the dataset, to the full GRIB ProductTemplate). Also record when the coordinates change. Changes in horizontal resolution probably have to be loaded as different xarray datasets (see JackKelly/hypergrib#15 and JackKelly/hypergrib#17).
Also need to decode .idx parameter strings like this (from HRRR): var discipline=0 center=7 local_table=1 parmcat=16 parm=201
Open other GRIB datasets. (If we have to parse the step from the body of .idx files then consider using nom).
Optimise the extraction of the horizontal spatial coords from the GRIBs by only loading the relevant sections from the GRIBs (using the .idx files). Although this optimisation isn't urgent. Users will never have to run this step.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
explore_nwps		explore_nwps
tests/aws		tests/aws
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock