Consider better packaging of h5 files from exported ROIs #209

emlynjdavies · 2024-08-26T11:44:46Z

Is your feature request related to a problem? Please describe.
exported ROIs are put into a folder that becomes eventually a very long list of h5 files

Describe the solution you'd like
package this into a single file that is easier to transport between discs

Describe alternatives you've considered
maybe some tools like xarray.open_mfdataset could be helpful here?

emlynjdavies · 2024-08-26T11:45:25Z

is this related to #207 @nepstad ?

nepstad · 2024-08-26T12:08:37Z

is this related to #207 @nepstad ?

Not directly, no. #207 concerns the stats netcdf file.

nepstad · 2024-09-17T14:56:41Z

We could supply a "merge ROI files" option to the PyOPIA command line interface. Since the ROIs have different shapes, xarray/netcdf is not ideal, but we could simply drop them into one big hdf5 file.

nepstad · 2024-09-17T15:47:16Z

Something like this could be a solution, producing one merged hdf5 file with one group for each processed image, containing all the ROIs as datasets under those groups:

h5files = sorted(glob('...'))
with h5py.File('silcam_rois_combined.h5', 'w') as f:
    for h5file in h5files:
        with h5py.File(h5file, 'r') as file_in:
            g = f.create_group(os.path.basename(h5file).replace('.h5', ''))
            for name in file_in:
                if 'PN' in name:
                    g.create_dataset(name, data=file_in['/'+name][:])

nepstad · 2024-09-18T10:24:59Z

Another option is to look into a more cloud-friendly storage solution, such as zarr: https://zarr.dev/

emlynjdavies · 2024-09-19T04:46:23Z

If we switched to zarr, we could put the ROIs as a subgroup within the main STATS output - then we could have all output files in the same place.

See zarr groups here

emlynjdavies added the patch / enhancement improved functionality or patch indented for changes that require bumping only the PATCH number label Aug 26, 2024

emlynjdavies added major new feature Changes in high-level pipeline use and/or data output that are not backwards-compatible and removed patch / enhancement improved functionality or patch indented for changes that require bumping only the PATCH number labels Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider better packaging of h5 files from exported ROIs #209

Consider better packaging of h5 files from exported ROIs #209

emlynjdavies commented Aug 26, 2024

emlynjdavies commented Aug 26, 2024

nepstad commented Aug 26, 2024

nepstad commented Sep 17, 2024

nepstad commented Sep 17, 2024 •

edited

Loading

nepstad commented Sep 18, 2024

emlynjdavies commented Sep 19, 2024 •

edited

Loading

Consider better packaging of h5 files from exported ROIs #209

Consider better packaging of h5 files from exported ROIs #209

Comments

emlynjdavies commented Aug 26, 2024

emlynjdavies commented Aug 26, 2024

nepstad commented Aug 26, 2024

nepstad commented Sep 17, 2024

nepstad commented Sep 17, 2024 • edited Loading

nepstad commented Sep 18, 2024

emlynjdavies commented Sep 19, 2024 • edited Loading

nepstad commented Sep 17, 2024 •

edited

Loading

emlynjdavies commented Sep 19, 2024 •

edited

Loading