Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The difference between scipy.spatial.cKDTree and pygeos.STRtree #889

Open
lyingTree opened this issue Mar 23, 2022 · 4 comments · May be fixed by #894
Open

The difference between scipy.spatial.cKDTree and pygeos.STRtree #889

lyingTree opened this issue Mar 23, 2022 · 4 comments · May be fixed by #894

Comments

@lyingTree
Copy link

lyingTree commented Mar 23, 2022

In the opendrift/readers/basereader/unstructured.py ,I found the method build_ckdtree and nearest_ckdtree take a long time when running the openoil, so I made the following changes in the method which find the nearest point:

# ----------------------------unstructured.py----------------------------
# def _build_ckdtree_(self, x, y):
#     from scipy.spatial import cKDTree
#     P = np.vstack((x, y)).T
#     return cKDTree(P)
# Replace _build_ckdtree_ with _build_strtree_
def _build_strtree_(self, x, y):
    from pygeos import STRtree, points
    return STRtree(points(x,y))

#def __nearest_ckdtree__(self, idx, x, y):
#    q = np.vstack((x, y)).T
#    return idx.query(q, k = 1, workers = self.PARALLEL_WORKERS)[1]
# Replace __nearest_ckdtree__ with __nearest_strtree__
def __nearest_strtree__(self, idx, x, y):
    from pygeos import points
    return idx.nearest(pygeos.points(x,y))[1]

# line 180
# return self.__nearest_ckdtree__(self.nodes_idx, x, y)
return self.__nearest_strtree__(self.nodes_idx, x, y)

# line 186
# return self.__nearest_ckdtree__(self.nodes_idx, x, y)
return self.__nearest_strtree__(self.nodes_idx, x, y)
# ----------------------------reader_netCDF_CF_unstructured.py----------------------------
# line 173
# self.nodes_idx = self._build_ckdtree_(self.x, self.y)
self.nodes_idx = self._build_strtree_(self.x, self.y)

# line 176
# self.nodes_idx = self._build_ckdtree_(self.x, self.y)
self.faces_idx = self._build_strtree_(self.x, self.y)

Here I use the current data from FVCOM and the constant wind to drive openoil, so I only modified the reader_netCDF_CF_unstructured.py and unstructured.py, but the graph of tracks is different from before the modification. I wonder if this modification is correct? If it's correct, STRtree may be a better choice than cKDTree, beacauce it can increase the operating speed of opendrift by more than 10%.

image

@knutfrode
Copy link
Collaborator

If you are getting different results, then probably something is wrong. You can run pytest to check if tests are passing.

@gauteh
Copy link
Member

gauteh commented Mar 23, 2022

Hi, it looks like pygeos is going to be merged with Shapely and be released as part of Shapely 2.0: https://pygeos.readthedocs.io/en/stable/, it seems that shapely 1.8 is a preview of 2.0. If this code is already part of shapely it would be great to use shapely in stead - one less dependency. If the STRtree works as well as ckdtree I think we should consider swapping it out, and actually reduce our dependencies, since we need shapely anyway.

https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree

If you submit your code as a pull request it will be easy to review and test for us.

@lyingTree
Copy link
Author

lyingTree commented Mar 23, 2022

If you are getting different results, then probably something is wrong. You can run pytest to check if tests are passing.

Yes, I have something wrong in operation. The results are same. The line 176 in reader_netCDF_CF_unstructured.py is error after my modification. The right modification is showed below:

# ----------------------------unstructured.py----------------------------
# def _build_ckdtree_(self, x, y):
#     from scipy.spatial import cKDTree
#     P = np.vstack((x, y)).T
#     return cKDTree(P)
# Replace _build_ckdtree_ with _build_strtree_
def _build_strtree_(self, x, y):
    from pygeos import STRtree, points
    return STRtree(points(x,y))

#def __nearest_ckdtree__(self, idx, x, y):
#    q = np.vstack((x, y)).T
#    return idx.query(q, k = 1, workers = self.PARALLEL_WORKERS)[1]
# Replace __nearest_ckdtree__ with __nearest_strtree__
def __nearest_strtree__(self, idx, x, y):
    from pygeos import points
    return idx.nearest(points(x,y))[1]

# line 180
# return self.__nearest_ckdtree__(self.nodes_idx, x, y)
return self.__nearest_strtree__(self.nodes_idx, x, y)

# line 186
# return self.__nearest_ckdtree__(self.faces_idx, xc, yc)
return self.__nearest_strtree__(self.faces_idx, xc, yc)
# ----------------------------reader_netCDF_CF_unstructured.py----------------------------
# line 173
# self.nodes_idx = self._build_ckdtree_(self.x, self.y)
self.nodes_idx = self._build_strtree_(self.x, self.y)

# line 176
# self.faces_idx = self._build_ckdtree_(self.xc, self.yc)
self.faces_idx = self._build_strtree_(self.xc, self.yc)

After my test, the efficiency has increased by more than 30%.

So It may be a nice choice to complete this function.

Thanks!

@lyingTree
Copy link
Author

Hi, it looks like pygeos is going to be merged with Shapely and be released as part of Shapely 2.0: https://pygeos.readthedocs.io/en/stable/, it seems that shapely 1.8 is a preview of 2.0. If this code is already part of shapely it would be great to use shapely in stead - one less dependency. If the STRtree works as well as ckdtree I think we should consider swapping it out, and actually reduce our dependencies, since we need shapely anyway.

https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree

If you submit your code as a pull request it will be easy to review and test for us.

Thanks, it will be a nice improvement. I will submit my code after

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants