Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

Least squares polynomial fit with Dask #144

Open
andersy005 opened this issue Jul 16, 2019 · 4 comments
Open

Least squares polynomial fit with Dask #144

andersy005 opened this issue Jul 16, 2019 · 4 comments
Labels
2 - normal Normal priority enhancement New feature or request

Comments

@andersy005
Copy link
Contributor

Per discussion with @matt-long, it would be useful to add a least squares polynomial fit to esmlab:

import xarray as xr
import dask.array as da 
def _order_and_stack(darr, dim):
    dims_stacked = [d for d in darr.dims if d != dim]
    new_dims = [dim, ] + dims_stacked
    if darr.ndim > 2:
        darr_stacked = (darr.transpose(*new_dims).stack(other_dims=dims_stacked).dropna('other_dims'))

    elif darr.ndim == 2:
        darr_stacked = darr_stacked.transpose(*new_dims)

    else:
        darr_stacked = darr

    return darr_stacked

def _unstack(darr):
    if 'other_dims' in darr.dims:
        darr_unstacked = darr.unstack('other_dims')

    else:
        darr_unstacked = darr

    return darr_unstacked

def _polyfit(darr, x, deg, dim):
    dim_chunk = darr.chunks[darr.get_axis_num(dim)][0]
    x = x.chunk(chunks={dim: dim_chunk, 'degree': deg + 1})
    darr_stacked = _order_and_stack(darr, dim)
    p, err, _, _ = da.linalg.lstsq(x.data, darr_stacked.data)
    new_name = f'{darr.name}_poly_coeffs'
    new_dims = ('degree',) + darr_stacked.dims[1:]
    new_coords = {coord: darr_stacked.coords[coord] for coord in darr_stacked.coords 
                  if coord != dim}
    lstsq_coeffs = xr.DataArray(p, name=new_name, coords=new_coords, dims=new_dims)
    return lstsq_coeffs



def polyfit(darr, deg=1, dim=None, coord=None):
    if not dim:
        dim = 'time'

    if not coord:
        coord = darr[dim]

    t = coord

    x = xr.concat([t ** d for d in range(deg + 1)], dim='degree')
    x = x.transpose(dim, 'degree')

    coeffs = _polyfit(darr, x, deg, dim)
    coeffs = coeffs.assign_coords(degree=range(deg + 1)) 
    return _unstack(coeffs)
@bradyrx
Copy link
Contributor

bradyrx commented Jul 20, 2019

One thing we've come across at climpred is that polyfit and other functions return zeros or NaNs when they encounter missing data in a time series. This isn't a problem to worry about if you only anticipate esmlab to be used in climate model output, but observational data frequently has missing data.

We deal with this by temporarily doing a linear interpolation to fill in the NaNs, then force those back to NaNs on the returned output (in the case of detrending; wouldn't need to replace the NaNs if just getting the slope): https://github.com/bradyrx/climpred/blob/8c1700e94d124424f5f2adf5250a22f2ff8bb3dc/climpred/stats.py#L118

We also have some snippets in there to handle Datasets in addition to DataArrays.

@andersy005 andersy005 added 2 - normal Normal priority enhancement New feature or request and removed feature labels Jul 31, 2019
@kmpaul kmpaul closed this as completed Oct 21, 2019
@andersy005 andersy005 reopened this Oct 21, 2019
@klindsay28
Copy link

@kmpaul , I'm curious why this issue was closed with no explanatory comment.
It doesn't look like the enhancement has been implemented.

@andersy005
Copy link
Contributor Author

@klindsay28, I just re-opened it

@andersy005
Copy link
Contributor Author

I am planning on working on this as part of pydata/xarray#3349

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
2 - normal Normal priority enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants