Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace gdal_proximity with a #15

Open
hcwinsemius opened this issue Oct 12, 2024 · 0 comments
Open

replace gdal_proximity with a #15

hcwinsemius opened this issue Oct 12, 2024 · 0 comments

Comments

@hcwinsemius
Copy link
Contributor

hcwinsemius commented Oct 12, 2024

I read into what gdal_proximity does. It calculates the nearest distance to masked areas. This requires iterating over each cell. In numpy this will be slow, but we implemented similar functionalities in the package pyflwdir that iterates many times, which is quite fast. A function signature like below is worth trying (not tested).

import numpy as np
import numba as nb

# create a njit (no-python just-in-time compiling) function below. This code will be converted to machine code
# and compiled on-the-fly making it much much faster. cache=True ensures the compilation is stored for the next
# time the function is used.

@nb.njit(cache=True)
def mv_proximity(mask):
    """Calculates nearest distance of missing values to non-missing values.

    Parameters
    ----------
    mask : np.ndarray [bool]
        2D array with True at locations with data, and False at locations without data
    Returns
    -------
    np.ndarray
        2D array containing distances to nearest non-missing value
    """
    distance = np.ones_like(data, dtype=np.float32) * np.inf  # start with infinite distances per grid cell

    rows, cols = distance.shape
    # it may be faster to first compute indexes of masked cells
    for row in range(rows):
        for col in range(cols):
            if mask[row, col]:
                # same here, it may be faster to first index logical values before running through them. Not yet carefully thought through...
                for k in range(rows):
                    for l in range(cols):
                        dist = np.sqrt((row - k) ** 2 + (col - l) ** 2)
                        if dist < distance[k, l]:
                            distance[k, l] = dist

    return distance

This should be tested for speed and compared against results of gdal_proximity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant