KMeans Convergence Criteria #3675

ghost · 2022-10-05T09:11:39Z

ghost
Oct 5, 2022

In some other implementations I used, KMeans has a convergence criteria based on updates. If the update is smaller than a set fixed hyperparameter, the convergence is assured and KMeans is stopped.

As far as I know, in FAISS KMeans the only stopping criteria is niter (max_iterations). What would be the ideal way to choose the niter? Does it depend on number of data points or n_clusters?

mdouze · 2022-10-06T12:01:42Z

mdouze
Oct 6, 2022
Collaborator

For large kmeans the number of iterations is usually set to how much you can afford computationally.

0 replies

sstone-codaio · 2022-11-22T06:10:47Z

sstone-codaio
Nov 22, 2022

But that doesn't make sense, imagine I have enough resources to run a large number of iterations, there's no reason to waste resource doing that if the stopping criteria is met. Is it possible to provide something like tolerance in scikit learn?

0 replies

mdouze · 2022-11-22T08:32:15Z

mdouze
Nov 22, 2022
Collaborator

There is a Python implementation of k-means that can be tuned more precisely to the user's needs, see

https://github.com/facebookresearch/faiss/blob/main/contrib/clustering.py#L321

There is no (or negligible) performance impact compared to the C++ version.

0 replies

crypdick · 2024-05-06T14:45:15Z

crypdick
May 6, 2024

@mdouze I came here to ask the same question. I am running KMeans on >100M 2048-D vectors and set niter to 100. The training converged hours ago but the job has to keep running because there is no stopping criteria. Burning all these GPUs is not environmentally friendly.

The Python code you shared would not work with np.memmap arrays, the random indexing would attempt to load the array into RAM as a copy.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KMeans Convergence Criteria #3675

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

KMeans Convergence Criteria #3675

ghost Oct 5, 2022

Replies: 4 comments

mdouze Oct 6, 2022 Collaborator

sstone-codaio Nov 22, 2022

mdouze Nov 22, 2022 Collaborator

crypdick May 6, 2024

ghost
Oct 5, 2022

mdouze
Oct 6, 2022
Collaborator

sstone-codaio
Nov 22, 2022

mdouze
Nov 22, 2022
Collaborator

crypdick
May 6, 2024