-
Notifications
You must be signed in to change notification settings - Fork 115
Practical Examples Using Ngt Command
Practical examples with a large-scale dataset for a default NGT graph (ANNG) are described.
First, to describe how to search large-scale datasets, NGT dataset needs to be generated. After downloading the fastText dataset, it should be converted to the NGT registration format as follows.
curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip
zcat wiki-news-300d-1M-subword.vec.zip | tail -n +2 | cut -d " " -f 2- > objects.ssv
objects.ssv is a registration file that has 1 million objects. Next, three objects in the middle of the file are extracted as queries.
head -100000 objects.ssv | tail -3 > queries.ssv
An ANNG index is constructed with cosine similarity for metric space.
ngt create -d 300 -D c fasttext.anng objects.ssv
The ANNG index can be searched with the queries as follows.
ngt search -n 10 fasttext.anng queries.ssv
Below are the search results.
Query No.1
Rank ID Distance
1 99998 0
2 52298 0.305776
3 75134 0.316977
4 207850 0.345267
5 258522 0.347003
6 307367 0.356967
7 538054 0.379649
8 76751 0.386644
9 535024 0.390781
10 202010 0.392031
Query Time= 0.00144647 (sec), 1.44647 (msec)
Size of Memory Usage=1531284
Query No.2
Rank ID Distance
1 99999 0
2 291507 0.232563
3 207863 0.285354
4 122249 0.3664
5 349590 0.37732
6 259506 0.380484
7 96071 0.390346
8 312097 0.400417
9 382245 0.404268
10 84622 0.404282
Query Time= 0.00166992 (sec), 1.66992 (msec)
Size of Memory Usage=1531340
Query No.3
Rank ID Distance
1 100000 0
2 565218 0.384514
3 623867 0.404919
4 194709 0.43841
5 206136 0.452629
6 927014 0.45504
7 66427 0.457764
8 772264 0.463388
9 456866 0.463402
10 742553 0.463514
Query Time= 0.00255108 (sec), 2.55108 (msec)
Size of Memory Usage=1531344
Average Query Time= 0.00188916 (sec), 1.88916 (msec), (0.00566747/3)
When a higher accuracy is needed, you can specify a larger search_range_coefficient value than the default 0.1 as shown below.
ngt search -n 10 -e 0.15 fasttext.anng queries.ssv
When a short query time is needed at the expense of accuracy, you can specify a smaller search_range_coefficient value.
Command line tool
Python
C++