Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing a new input dataset #18

Open
matteodefelice opened this issue May 15, 2019 · 9 comments
Open

Introducing a new input dataset #18

matteodefelice opened this issue May 15, 2019 · 9 comments

Comments

@matteodefelice
Copy link

I would like to use atlite with the EFAS data recently released on the Copernicus Data Store. Do you have a tutorial or some docs on how to "add" another input dataset?

@euronion
Copy link
Collaborator

No. As far as I know there is no tutorial for setting up new data sources at the moment.

The ERA-5 data is also downloaded from the CDS, see datasets/era5.py. You should be able to use it as a starting point. I expect the structure for the request and postprocessing to be similar.

Important side note:
@coroa Has been working on a new data backend. This will also affect the way the data importers work. You might want to check with him first to see what will change in the upcoming version.

@coroa
Copy link
Member

coroa commented May 15, 2019 via email

@coroa
Copy link
Member

coroa commented May 15, 2019

I put a bit more thought into locating the correct grid cells. Let me illustrate the problem and what I assume the solution is (and I think it would be great to integrate that into atlite).

Consider the following plot of the river Saale in Thuringia and its basin, which flows into the Elbe (not shown):
GWK56_meanfriv
Red lines are the river itself (solid is the river network of Saale up to level 3, dashed its tributaries on level 4, where levels are taken from the German Gewässerkennzahl system). In Orange, I added the delineations of the catchment basins as provided by the Hydrobasins dataset (in level 7, i think, but would have to check). In the background I show the mean river flow from a dataset produced by the HD-model maintained by the Helmholtz Zentrum Geesthacht, which is similar to LISFLOOD, albeit simpler. The colored dots show GRDC stations for which measured daily river flow data can be requested from the BfG (a German institution).

The problem is now that for example for the GRDC station marked in orange the closest river flow grid cell holds the river flow through the main river stem coming from the South, while the station only sits at a tiny tributary entering from the right. So as a correction, in this case one would have to take the river flow immediately to the East.

From the EFAS quote above and a discussion I had with HZG earlier, I gather that the preferred solution to this conundrum is to publish the catchment basin area for each grid cell alongside the grid.

To locate the best grid cell for a particular latitude, longitude position, you then compute the total area of the upstream catchment basins (based on hydrobasins for example) and find the grid cell close to your location that best represents the upstream area.

@matteodefelice
Copy link
Author

matteodefelice commented May 16, 2019

To locate the best grid cell for a particular latitude, longitude position, you then compute the total area of the upstream catchment basins (based on hydrobasins for example) and find the grid cell close to your location that best represents the upstream area.

I am not sure to have understood correctly (I am not an expert for hydrological data). What does it mean for a grid cell to represent the upstream area?

@coroa
Copy link
Member

coroa commented May 16, 2019

Neither am I, but what I mean is:

Look through all neighbouring grid cells (probably only direct neighbours) and use the one where the hydrobasins estimated upstream area is similar to the one provided by the EFAS grid.

@coroa
Copy link
Member

coroa commented May 16, 2019

Estimation of upstream area using hydrobasins would work something like:

gpd.sjoin(powerplants, hydrobasins[['UP_AREA', 'geometry']], how='left', op='within').rename(columns={'UP_AREA': 'estim_area'})

A test with the reported upstream areas for the German stations in the GRDC stations catalogue finds a relatively good agreement for rivers with high flow volumes:

import os
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point

HYDRO_BASINS_DATA_PATH = os.path.join(
    'hybas_eu_lev01-12_v1c',
    'hybas_eu_lev12_v1c.shp'
)

GRDC_PATH = 'grdc_stations.xlsx'

hydrobasins = gpd.read_file(HYDRO_BASINS_DATA_PATH)

grdc = pd.read_excel(GRDC_PATH, na_values=[-999, 'n.a.'])
grdc_de = grdc.loc[grdc.country == 'DE']

grdc_de = gpd.GeoDataFrame(grdc_de, geometry=grdc_de[['long', 'lat']].apply(Point, axis=1), crs=dict(init='epsg:4326'))
grdc_de =  gpd.sjoin(grdc_de, hydrobasins[['UP_AREA', 'geometry']], how='left', op='within').rename(columns={'UP_AREA': 'estim_area'})

pd.DataFrame(grdc_de).plot.scatter(x='area', y='estim_area', s=grdc_de.r_volume_yr)
plt.xscale('log')
plt.yscale('log')

grdc_estim_upstream_area

@coroa
Copy link
Member

coroa commented May 16, 2019

Anyway, since the link to upArea.nc the EFAS available data page is still dead, it makes sense to reach out to them, anyway. Would you mind doing that, Matteo?

@matteodefelice
Copy link
Author

Sure! I will let you know once I get a reply.

@coroa
Copy link
Member

coroa commented May 16, 2019

It seems, your email was received, since the wiki page has just now been updated with a note, that the upstream area is provided as an extra variable in the netcdf files:

In [5]: ds.upArea                                                                                                      
Out[5]: 
<xarray.DataArray 'upArea' (y: 950, x: 1000)>
[950000 values with dtype=float32]
Coordinates:
  * y           (y) float64 5.498e+06 5.492e+06 ... 7.575e+05 7.525e+05
  * x           (x) float64 2.502e+06 2.508e+06 ... 7.492e+06 7.498e+06
    time        datetime64[ns] ...
    step        timedelta64[ns] ...
    surface     int64 ...
    latitude    (y, x) float32 ...
    longitude   (y, x) float32 ...
    valid_time  datetime64[ns] ...
Attributes:
    grid_mapping:    lambert_azimuthal_equal_area
    long_name:       upstream_area
    standard_name:   upstream
    units:           M2
    esri_pe_string:  PROJCS["ETRS_1989_LAEA",GEOGCS["GCS_ETRS_1989",DATUM["D_...
    cell_methods:    time: mean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants