Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

enable time_coord_name to be different from "time" #39

Open
matt-long opened this issue Jan 25, 2019 · 10 comments
Open

enable time_coord_name to be different from "time" #39

matt-long opened this issue Jan 25, 2019 · 10 comments
Assignees

Comments

@matt-long
Copy link
Contributor

We assume throughout that the time coordinate is named "time", but this is overly restrictive.

We should introduce a variable time_coord_name and accept optional user input for this.

If the user does not supply time_coord_name, attempt to infer by looking for coordinate named "time", or checking for unlimited_dim.

Many instance like "time.month" will have to be changed to variables too:
time_dot_month = '.'.join([time_coord_name, "month"])

@andersy005 andersy005 added this to the sprint#1 milestone Jan 25, 2019
@andersy005 andersy005 self-assigned this Feb 1, 2019
@klindsay28
Copy link

Here's an alternative suggestion, instead of relying on unlimited_dim:

For each dimension in the dataset, if it has a corresponding variable, and cf_units.is_time returns True when passed that variable's unit attribute, then set time_coord_name to that dimension.

@matt-long
Copy link
Contributor Author

As discussed in #48 there are a variety of imperfect options for determining "time".

I suggest we check for a single unlimited_dim and use that as the default.

If there is no unlimited_dim or there is more than one unlimited_dim, we should look for a variable named "time". If there's no variable named time, we should abort unless the user has supplied the name of the time coordinate.

We could consider checking units, as @klindsay28 suggest, but that introduces complexity and there might be multiple variables or no variables with cf_units.is_time == True—and they might be the wrong variable to use.

@mnlevy1981
Copy link
Contributor

What about the following order of operations:

  1. If user has specified a time dimension use it, otherwise
  2. If there is only one variable with cf_units.is_time == True use it, otherwise
  3. If there is a variable named time use it, otherwise
  4. If there is only unlimited dimension use it, otherwise
  5. Throw an exception

I think the order of (2) - (4) is up for debate but my thinking is

  • If a dataset is cf compliant, trust it to tell you what dimension is named time
  • If a dataset has a variable named time and a single unlimited dimension (and the two are not the same), I would lean towards trusting the variable name

There are some edge-cases to consider, especially centered around multiple variables with cf_units.is_time == True -- one possibility is

2a. If there are multiple variables with cf_units.is_time == True then
3a. if there is one variable named time and cf_units.is_time == True for it use it, otherwise
4a. If there is a only one unlimited dimension and cf_units.is_time == True for it use it, otherwise
5a. Throw an exception

@kmpaul
Copy link

kmpaul commented Feb 1, 2019

Keep in mind that it is only convention that dictates a single unlimited dimension will be time. The only true way to check if a coordinate is a valid time coordinate is to check the units. And there is no way to pick the “right one” if there are multiple without user input.

@andersy005
Copy link
Contributor

Would this be enough?

def infer_time_coord_name(ds):
    unlimited_dims = ds.encoding.get('unlimited_dims', None)
    if len(unlimited_dims) == 1:
        time_coord_name = list(unlimited_dims)[0]
        return time_coord_name
        
    else:
        raise ValueError("Couldn't infer `time_coord_name` from multiple unlimited dimensions: %s \n\t\t ***** Please specify time_coord_name to use. *****" %unlimited_dims)
        
    
def compute_mon_climatology(dset, time_coord_name=None):
       # Infer the time_coord_name
       if time_coord_name is None: 
           time_coord_name = infer_time_coord_name(dset)
        
        # Do The Computation 

@matt-long
Copy link
Contributor Author

@andersy005 I would prefer to look for a variable named "time" in the triage approach. How about the following?

def infer_time_coord_name(ds):
    if "time" in ds.variables:
       return "time"

    unlimited_dims = ds.encoding.get('unlimited_dims', None)
    if len(unlimited_dims) == 1:
        return list(unlimited_dims)[0]

    raise ValueError("Couldn't infer `time_coord_name` from multiple unlimited dimensions: %s \n\t\t ***** Please specify time_coord_name to use. *****" %unlimited_dims)

@andersy005
Copy link
Contributor

@matt-long, your suggestion makes more sense. I will add it to the PR

@kmpaul
Copy link

kmpaul commented Feb 5, 2019

I have to admit, I grow more bothered by assuming that a single unlimited dimension is time. It is probably almost always true, but the problem is that if we are trying to identify the time coordinate so that we can do operations that can only be done if the coordinate is truly a time coordinate (eg, grouping by month or year, etc) then the only way to know how to do that is from the units and calendar. And if you have the units, then you can use cf-units to detect if it is truly a time coordinate. And that is much more robust.

@klindsay28
Copy link

What if instead of time_coord_name being a string, time_coord_names were a list of strings, of all dimensions whose corresponding coordinate variable has units that return True for cf_units.is_time?

Operations on the existing time would be applied to all time_coord_names.

Are there Dataset operations on time for which this would be problematic?

Are there time operations that require a unique time?

@andersy005
Copy link
Contributor

xref: This issue is partially addressed in #48.

Next step:

  • Make time_coord_name into a list and loop over that list in the computational routines. This will enable addressing multiple time-coordinates within a single file.

@andersy005 andersy005 removed this from the sprint-feb4-feb17 milestone Feb 12, 2019
@andersy005 andersy005 pinned this issue Feb 19, 2019
@andersy005 andersy005 unpinned this issue Apr 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants