Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

Add a new function to compute monthly averages #94

Closed
wants to merge 14 commits into from

Conversation

alperaltuntas
Copy link
Member

@alperaltuntas alperaltuntas commented Mar 13, 2019

Checklist

  • Enable and install pre-commit to ensure style-guides and code checks are followed.
  • Target master for bugfixes and doc changes.
  • Target devel for new features or functionality changes.
  • Include documentation when adding new features.
  • Include new tests or update existing tests when applicable.

Fixes: #55

Adds a new function to compute monthly averages from a given dataset that has more frequent data, e.g., 5 day means.

Summary of what the function does:

  • First, the input dataset gets "grouped by" time_bound months. (The time_bound gets expanded, i.e., gets reshaped to be 1-dimensional for the purpose of grouping. All the DataArrays in the dataset are reshaped accordingly.)
  • Then, a local function "weighted_monthly_mean" is applied to each group: Weights get computed for each chunk (e.g. 5-day) of each group (month), by taking into account how much of the chunk falls within the group.
  • Finally, time and time_bound get corrected for each group (month).

Note 1: Parts of the function may be (unnecessarily) complicated, so a code review may perhaps be helpful.
Note 2: The existing monthly climatology function (compute_mon_climatology) may be rewritten to call this new function (compute_mon_averages) to compute the averages, and then to compute the correct climatology (pretty easily).
Note 3: This is a draft pull request for now, since I haven't tested it thoroughly. Let me know any issues you notice/encounter.

To test this function, you may run the following on cheyenne:

from glob import glob
import xarray as xr
import esmlab

files = sorted(glob('/glade/scratch/altuntas/archive/g.e20.GIAF.T62_g37.test_d2m.001/ocn/hist/g.e20.GIAF.T62_g37.test_d2m.001.pop.h.0001-1*.nc'))
ds = xr.open_mfdataset(files, decode_times=False, decode_coords=False)
ds_monclim = esmlab.climatology.compute_mon_averages(ds)
ds_monclim.to_netcdf("test.nc")

Copy link
Contributor

@andersy005 andersy005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alperaltuntas, @matt-long, as a preliminary review comment, I was wondering if it would be worth exploring the usage of resample with cftime index as it's been implemented in pydata/xarray#2593?

I will add more comments for the rest of the PR tomorrow morning.

@matt-long
Copy link
Contributor

@alperaltuntas, have you looked into @andersy005's suggestion? I think the new resample capability on CFTimeIndex could greatly simplify this type of application.

@alperaltuntas
Copy link
Member Author

@matt-long, @andersy005, I'll look into it. Thanks.

@matt-long
Copy link
Contributor

One think to keep in mind is an ability to correctly handled missing values that vary in time.

@andersy005 andersy005 closed this Mar 27, 2019
@andersy005
Copy link
Contributor

@alperaltuntas, I am sorry I deleted the devel without making sure that there were no open pull requests. When you get time, please open a PR against the master branch

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants