🐛[BUG]: Climate data sources do not set stride for first index #653

sahnimanas · 2024-08-23T05:47:19Z

Version

0.7

On which installation method(s) does this occur?

No response

Describe the issue

The climate data sources (ERA5DaliExternalSource and ClimateDataSourceSpec) both specify a stride parameter to read data at strides larger than what's stored in the dataset files.
But the stride is only applied when reading multiple time steps (num_steps>1)
The first index that is read remains independent of the stride. Example: https://github.com/NVIDIA/modulus/blob/main/modulus/datapipes/climate/era5_hdf5.py#L598

My understanding of the stride parameter is that it should allow reading subsamples of the data source even for the first index.
If that's not the expected usage, this bug could instead be viewed as a feature enhancement, for what a quick addition to allow coarser-granularity subsets of the same dataset on disk (e.g. read every 6 hours from a 1-hour dataset on disk)

Minimum reproducible example

No response

Relevant log output

No response

Environment details

No response

The text was updated successfully, but these errors were encountered:

mnabian · 2024-10-17T01:28:06Z

@loliverhennigh could you please take a look at this issue?

sahnimanas · 2024-10-17T21:55:09Z

I've looked at this a bit more and I think if we agree with the interpretation of stride here, then it should be sufficient to fix it by loading data[self.stride * in_idx] instead of data[in_idx]. But it may also be reasonable if one interprets the currrent definition to be consistent with the implementation.

Overall, such behavior may be left better as something for the user to extend on their own. Both the above implementations may be relevant to different users, and it's hard for them to confirm the behavior without looking at the code. If existing behavior is not what they want, then it's hard to extend the datapipe due to lack of modularity. An alternate design proposed has been proposed internally & may be more appropriate.

sahnimanas added ? - Needs Triage Need team to review and classify bug Something isn't working labels Aug 23, 2024

mnabian assigned loliverhennigh Oct 17, 2024

mnabian removed the ? - Needs Triage Need team to review and classify label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛[BUG]: Climate data sources do not set stride for first index #653

🐛[BUG]: Climate data sources do not set stride for first index #653

sahnimanas commented Aug 23, 2024

mnabian commented Oct 17, 2024

sahnimanas commented Oct 17, 2024

🐛[BUG]: Climate data sources do not set stride for first index #653

🐛[BUG]: Climate data sources do not set stride for first index #653

Comments

sahnimanas commented Aug 23, 2024

Version

On which installation method(s) does this occur?

Describe the issue

Minimum reproducible example

Relevant log output

Environment details

mnabian commented Oct 17, 2024

sahnimanas commented Oct 17, 2024