Reducing RAM cost 95% searching one cluster at a time and indexing centroids in wasm js #168

vtempest · 2024-09-19T21:58:41Z

@masajiro こんにちは you are the wise sensei of vector search whose NGT tops hnsw based popular engines on benchmarks. I am curious if you think this approach can work to limit ram size need. Also a good name for a wasm port of ngt would be Graph-based Embedding Tree Search of Nearest Neighbor (GETS-NN)

https://airesearch.js.org/functions/addEmbeddingVectorsToIndex.html

VSEARCH: Vector Similarity Embedding Approximation in RAM-Limited Cluster Heirarchy

Compile hnswlib-node or NGT algorithm C++ to WASM JS for efficient similarity search.
Vector index is split by K-means into regional clusters, each being a specific size to fit in RAM. This is better than popular vector engines that require costly 100gb-RAM servers because they load all the vectors at once.
Vectors for centroids of each cluster are stored in a list in SQL, each cluster's binary quantized data is exported as base64 string to SQL, S3, etc.
Search: Embed Query, Compare to each cluster centroid to pick top clusters, download base64 strings for those clusters, load each into WASM, find top neighbors per cluster, merge results sorted by distance

#include <iostream>
#include <vector>
#include <utility>
#include "NGT/Clustering.h"

std::pair<std::vector<std::vector<float>>, std::vector<std::vector<size_t>>> 
clusterVectors(NGT::Index& index, size_t numClusters) {
    NGT::Clustering clustering;
    clustering.setClusteringType(NGT::Clustering::ClusteringTypeKmeansWithNGT);
    clustering.setInitializationMode(NGT::Clustering::InitializationModeKmeansPlusPlus);
    clustering.setNumberOfClusters(numClusters);
    clustering.setMaximumIteration(100);

    std::vector<NGT::Clustering::Cluster> clusters;
    double diff = clustering.kmeansWithNGT(index, numClusters, clusters);

    std::cout << "Clustering completed with diff: " << diff << std::endl;

    std::vector<std::vector<float>> centroids;
    std::vector<std::vector<size_t>> clusterIndices;
    
    centroids.reserve(clusters.size());
    clusterIndices.reserve(clusters.size());

    for (const auto& cluster : clusters) {
        // Add centroid
        centroids.push_back(cluster.centroid);

        // Add member indices
        std::vector<size_t> indices;
        indices.reserve(cluster.members.size());
        for (const auto& member : cluster.members) {
            indices.push_back(member.vectorID);
        }
        clusterIndices.push_back(std::move(indices));
    }

    return std::make_pair(centroids, clusterIndices);
}

int main() {
    // Assume we have an NGT index already created and populated
    NGT::Property property;
    property.dimension = 128;  // Set this to your vector dimension
    property.objectType = NGT::Index::Property::ObjectType::Float;
    property.distanceType = NGT::Index::Property::DistanceType::DistanceType2;

    NGT::Index index(property);

    // Load your vectors into the index here
    // For example:
    // std::vector<float> vec(128);
    // for (int i = 0; i < 1000; i++) {
    //     // Fill vec with data
    //     index.insert(vec);
    // }
    // index.createIndex(16);

    size_t numClusters = 10;
    auto [centroids, clusterIndices] = clusterVectors(index, numClusters);

    std::cout << "Number of clusters: " << centroids.size() << std::endl;
    for (size_t i = 0; i < centroids.size(); i++) {
        std::cout << "Cluster " << i << " size: " << clusterIndices[i].size() << std::endl;
        std::cout << "Centroid " << i << " first few dimensions: ";
        for (size_t j = 0; j < std::min(size_t(5), centroids[i].size()); j++) {
            std::cout << centroids[i][j] << " ";
        }
        std::cout << "..." << std::endl;
    }

    return 0;
}

The text was updated successfully, but these errors were encountered:

masajiro · 2024-09-20T11:21:08Z

@vtempest
こんにちは
Thank you for your interest in NGT.
I am not familiar with WASM, but this seems like a really interesting approach. I think your approach could enable ANNS on limited RAM. However, to further improve recall, you may need to load more clusters, which could significantly increase search time. As you may have already considered quantization, quantizing the vectors would reduce the cluster size, which seems like an effective way to run it on limited RAM.

vtempest changed the title ~~Reducing RAM cost 95% with by searching one cluster at a time and indexing centroids in wasm js~~ Reducing RAM cost 95% searching one cluster at a time and indexing centroids in wasm js Sep 19, 2024

vtempest mentioned this issue Sep 22, 2024

Feature: How is WASM Compiled? Are you using wasm_simd128 and msimd128 ? ashvardanian/SimSIMD#189

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing RAM cost 95% searching one cluster at a time and indexing centroids in wasm js #168

Reducing RAM cost 95% searching one cluster at a time and indexing centroids in wasm js #168

vtempest commented Sep 19, 2024 •

edited

Loading

masajiro commented Sep 20, 2024

Reducing RAM cost 95% searching one cluster at a time and indexing centroids in wasm js #168

Reducing RAM cost 95% searching one cluster at a time and indexing centroids in wasm js #168

Comments

vtempest commented Sep 19, 2024 • edited Loading

masajiro commented Sep 20, 2024

vtempest commented Sep 19, 2024 •

edited

Loading