Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cube.collapsed fails with multi-dimensional string coordinates #3653

Open
PAGWatson opened this issue Feb 10, 2020 · 7 comments · May be fixed by #4294 or #5955
Open

cube.collapsed fails with multi-dimensional string coordinates #3653

PAGWatson opened this issue Feb 10, 2020 · 7 comments · May be fixed by #4294 or #5955
Assignees

Comments

@PAGWatson
Copy link

PAGWatson commented Feb 10, 2020

Hi, I have a cube like that below, with with data from multi-ensemble member climate model runs covering different years. The 'Expt ID' coordinate contains the run ID corresponding to each ensemble member for each year. I get an error when I do cube.collapsed('year', iris.analysis.MEAN).

I did previously collapse the cube over a 'season' coordinate (since removed), where each season had three time values, so perhaps this issue only arises when an entire dimension is collapsed?

print cube
air_temperature / (K)               (time: 30; Ens member: 15; latitude: 145; longitude: 192)
     Dimension coordinates:
          time                           x              -             -               -
          Ens member                     -              x             -               -
          latitude                       -              -             x               -
          longitude                      -              -             -               x
     Auxiliary coordinates:
          season_year                    x              -             -               -
          year                           x              -             -               -
          Expt ID                        x              x             -               -

cube_mean=cube.collapsed('year',iris.analysis.MEAN)

The full error message is given below. It seems that the problem is that iris tries joining the strings in 'Expt ID' into one long string, and then finds that this does not have the same size as the 'Ens member' dimension.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-43764f0e4a7d> in <module>()
----> 1 cube_seasmean.collapsed('year',iris.analysis.MEAN)

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in collapsed(self, coords, aggregator, **kwargs)
   3253                 local_dims = [coord_dims.index(dim) for dim in
   3254                               dims_to_collapse if dim in coord_dims]
-> 3255                 collapsed_cube.replace_coord(coord.collapsed(local_dims))
   3256 
   3257         untouched_dims = sorted(untouched_dims)

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in replace_coord(self, new_coord)
   1181             self.add_dim_coord(new_coord, dims[0])
   1182         else:
-> 1183             self.add_aux_coord(new_coord, dims)
   1184 
   1185         for factory in self.aux_factories:

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in add_aux_coord(self, coord, data_dims)
    964         if self.coords(coord):  # TODO: just fail on duplicate object
    965             raise ValueError('Duplicate coordinates are not permitted.')
--> 966         self._add_unique_aux_coord(coord, data_dims)
    967 
    968     def _check_multi_dim_metadata(self, metadata, data_dims):

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in _add_unique_aux_coord(self, coord, data_dims)
    996 
    997     def _add_unique_aux_coord(self, coord, data_dims):
--> 998         data_dims = self._check_multi_dim_metadata(coord, data_dims)
    999         self._aux_coords_and_dims.append([coord, data_dims])
   1000 

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in _check_multi_dim_metadata(self, metadata, data_dims)
    988                     raise ValueError(msg.format(dim, self.shape[dim],
    989                                                 metadata.name(), i,
--> 990                                                 metadata.shape[i]))
    991         elif metadata.shape != (1,):
    992             msg = 'Missing data dimensions for multi-valued {} {!r}'

ValueError: Unequal lengths. Cube dimension 0 => 15; metadata 'Expt ID' dimension 0 => 1.

A quick way of coming up with a solution is just to explicitly make a string that is the join of the individual strings, removing the string coord and re-adding the joined string as a scalar coord (I'm not good enough with iris to know if this would be very robust, but it seems to work for my case).

for coord in cube.aux_coords:
    if coord.ndim>1 and coord.dtype.char=='S':
        new_str='|'.join(coord.points.ravel())
        new_coord=iris.coords.AuxCoord(new_str, attributes=coord.attributes, long_name=coord.long_name, standard_name=coord.standard_name, units=coord.units, var_name=coord.var_name)
        cube.remove_coord(coord)
        cube.add_aux_coord(new_coord)
        
cube_mean=cube_.collapsed('year', iris.analysis.MEAN)  #now this works

It would be nice if iris would do something like this in cube.collapsed(). Even better would be a method that only collapses 'Expt ID' here along the dimension being collapsed, so the association of 'Expt ID' values with the 'Ens member' dimension would be maintained.

@rcomer
Copy link
Member

rcomer commented Feb 10, 2020

Even better would be a method that only collapses 'Expt ID' here along the dimension being collapsed, so the association of 'Expt ID' values with the 'Ens member' dimension would be maintained.

The aggregated_by method has string handling that does that. So I would say it’s desirable to have consistent behaviour in collapsed. I also think it should be relatively simple to implement.

@rcomer
Copy link
Member

rcomer commented Feb 10, 2020

The relevant handling for aggregated_by looks like this:

if coord.points.dtype.kind in "SU":
if coord.bounds is None:
new_points = []
new_bounds = None
# np.apply_along_axis does not work with str.join, so we
# need to loop through the array directly. First move axis
# of interest to trailing dim and flatten the others.
work_arr = np.moveaxis(coord.points, dim, -1)
shape = work_arr.shape
work_shape = (-1, shape[-1])
new_shape = (len(self),)
if coord.ndim > 1:
new_shape += shape[:-1]
work_arr = work_arr.reshape(work_shape)
for key_slice in self._slices_by_key.values():
if isinstance(key_slice, slice):
indices = key_slice.indices(
coord.points.shape[dim]
)
key_slice = range(*indices)
for arr in work_arr:
new_points.append("|".join(arr.take(key_slice)))
# Reinstate flattened dimensions. Aggregated dim now leads.
new_points = np.array(new_points).reshape(new_shape)
# Move aggregated dimension back to position it started in.
new_points = np.moveaxis(new_points, 0, dim)
else:
msg = (
"collapsing the bounded string coordinate {0!r}"
" is not supported".format(coord.name())
)
raise ValueError(msg)

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2021

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Aug 4, 2021
@rcomer
Copy link
Member

rcomer commented Aug 4, 2021

I believe this bug is very fixable, it just needs someone to find the time. So I say we leave this issue open.

@rcomer rcomer removed the Stale A stale issue/pull-request label Aug 4, 2021
@rcomer rcomer self-assigned this Aug 19, 2021
@rcomer rcomer linked a pull request Aug 24, 2021 that will close this issue
@rcomer
Copy link
Member

rcomer commented Oct 19, 2022

I have proposed a fix for this at #4294.

Copy link
Contributor

github-actions bot commented Mar 3, 2024

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Mar 3, 2024
@rcomer
Copy link
Member

rcomer commented Mar 6, 2024

I still think we should make this work but if we don't make it work we should at least raise a more decipherable error message.

@rcomer rcomer removed the Stale A stale issue/pull-request label Mar 6, 2024
@rcomer rcomer linked a pull request May 15, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
2 participants