Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heuristic for ERA5 download chunk sizes #252

Open
1 task
euronion opened this issue Sep 6, 2022 · 3 comments
Open
1 task

Add heuristic for ERA5 download chunk sizes #252

euronion opened this issue Sep 6, 2022 · 3 comments

Comments

@euronion
Copy link
Collaborator

euronion commented Sep 6, 2022

  • Think about heuristic to download in smaller/larger chunks depending on data geographical scope to download

ERA5 cutouts are currently being downloaded as time=yearly slices (after #236 on time=monthly slices) to avoid requesting too large data pieces from the ERA5 backend. Monthly retrieval could theoretically negatively affect the cutout preparation speed. We could emply a heuristic to check for the request size and then decide based on the size whether to use monthly or yearly retrieval.

See discussion here: #236 (comment)_

@johhartm
Copy link

@euronion Stumbled upon the same issue and adapted the timeframe to optmize for my usecase (small cutout but long timeframe). I added a heuristic to optimise the requests to be as large as possible while staying within the 120.000 fields limit. However, I don't know how to account for the size limit with cutouts for large areas. If someone could help me with this information, I might be able to implement this feature.

@euronion
Copy link
Collaborator Author

Hi @johhartm ,
Thanks for the initiative. I would assume an approach of estimating the number of fields through

resolution * range latitude * range longitude * number of time steps * variables within the request

should be good for a heuristic.

Where did you get the 120.000 fields number from? It is the first time I hear about a concrete number + it seems a bit small, but that might depend on the definition of what a "field" is.

@johhartm
Copy link

I got this number from playing with creating larger requests and have them failing with the error message that the request was to large and the maximum request size is 120.000 fields. For me, the heuristic number of time steps * variables within the request worked, but only downloaded data for a pretty limited spatial frame. However, I start to think that the spatial extend does not affect the "field size", but still might to be taken into account to prevent the file size per request from getting to large.
I will test this hypothesis with some larger cutouts and will get back when I have some results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants