Replies: 4 comments
-
For large kmeans the number of iterations is usually set to how much you can afford computationally. |
Beta Was this translation helpful? Give feedback.
-
But that doesn't make sense, imagine I have enough resources to run a large number of iterations, there's no reason to waste resource doing that if the stopping criteria is met. Is it possible to provide something like tolerance in scikit learn? |
Beta Was this translation helpful? Give feedback.
-
There is a Python implementation of k-means that can be tuned more precisely to the user's needs, see https://github.com/facebookresearch/faiss/blob/main/contrib/clustering.py#L321 There is no (or negligible) performance impact compared to the C++ version. |
Beta Was this translation helpful? Give feedback.
-
@mdouze I came here to ask the same question. I am running KMeans on >100M 2048-D vectors and set The Python code you shared would not work with |
Beta Was this translation helpful? Give feedback.
-
In some other implementations I used, KMeans has a convergence criteria based on updates. If the update is smaller than a set fixed hyperparameter, the convergence is assured and KMeans is stopped.
As far as I know, in FAISS KMeans the only stopping criteria is niter (max_iterations). What would be the ideal way to choose the niter? Does it depend on number of data points or n_clusters?
Beta Was this translation helpful? Give feedback.
All reactions