- Launch the db with docker:
docker-compose -f docker/docker-compose.yml up
- Connect to the db:
psql -h localhost -p 5432 -U docker -d demodb
. Password:docker
Presentation at Node.js Paris meet up: https://slides.com/pierre-louisleportz/deck/
We have a book
table with 10,000 entries and the following columns:
id | title | author | publishingYear | editor | category | lendingNumber2017 |
---|---|---|---|---|---|---|
uuid | string | string | integer | string | string | integer |
Time measurements of requests are made using PostgreSQL \timing
on my personal computer.
We want to search the table for books that contain the inputString
either in the title
, author
, editor
or category
columns.
SELECT title, author, editor, category FROM book WHERE title ILIKE '%performance%' OR author ILIKE '%performance%' OR editor ILIKE '%performance%' OR category ILIKE '%performance%';
-> 2 results
Average timing of 1 request over 10 tries: 33.89 ms
Advantages relatively to the ILIKE
method:
- Language specific semantic help: searching for the word "fox" will match the words "fox" and "foxes" but not "foxtrot" (because "foxes" gets normalized as "fox" by
to_tsvector
but "foxtrot" does not). Similarly, searching for "foxes" will match both "fox" and "foxes". - Results ranking by relevance
Step 1
Associate a text search vector to each entry:
UPDATE book SET tokens = to_tsvector('french', coalesce(title, '')) || to_tsvector('french', coalesce(author, '')) || to_tsvector('french', coalesce(editor, '')) || to_tsvector('french', coalesce(category, ''));
Step 2
Search:
SELECT title, author, editor, category FROM book WHERE tokens @@ to_tsquery('french', 'sciences');
=> Average timing over 10 tries : 4.10 ms (query duration decreased by 88%)
We can see the advantage of full text search versus the ILIKE method when we search for the input "performant" instead of "performance". The ILIKE method returns 0 results whereas the full text method returns the same 2 results as with the "performance" input.
Step 3
Add A, B, C or D weights to tokens (highest relevance is A and lowest is D). In this example we consider the title
and the author
to be the most relevant criteria and therefore they have the A
rank, while category
and editor
are ranked B
and C
respectively:
UPDATE book SET tokens = setweight(to_tsvector('french', coalesce(title, '')), 'A') || setweight(to_tsvector('french', coalesce(author, '')), 'A') || setweight(to_tsvector('french', coalesce(editor, '')), 'C') || setweight(to_tsvector('french', coalesce(category, '')), 'B');
Then you can get ranked results using the ts_rank
function, that computes a matching coefficient:
SELECT title, author, editor, category, ts_rank(tokens, to_tsquery('french', 'histoire')) FROM book WHERE tokens @@ to_tsquery('french', 'histoire') ORDER BY ts_rank(tokens, to_tsquery('french', 'histoire')) DESC;
Note: "histoire" is an interesting input because the first two results have different ts_rank
results.