Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Improved module search #3047

Open
wants to merge 15 commits into
base: nextgen
Choose a base branch
from
Open

Conversation

Ell1ott
Copy link
Contributor

@Ell1ott Ell1ott commented May 17, 2024

Previously, when searching for the modules and you made a typo, it would not show it. This is now fixed/improved using the typo-safe-search npm package

Examples:
image
image

It is still not perfect, but I much prefer this behaviour.

@Ell1ott Ell1ott changed the title Improved module search Feat: Improved module search May 23, 2024
@1zun4
Copy link
Member

1zun4 commented May 24, 2024

Is there any reason it is still on draft?

@Ell1ott Ell1ott marked this pull request as ready for review May 25, 2024 11:54
@Ell1ott
Copy link
Contributor Author

Ell1ott commented May 25, 2024

Is there any reason it is still on draft?

Not anymore. Just wanted to make sure that there weren't any edge cases where it wouldn't work, but is now ready to be merged.

@1zun4 1zun4 requested a review from SenkJu May 25, 2024 15:06
@1zun4 1zun4 added this to the 0.6.0 milestone May 25, 2024
@SenkJu
Copy link
Contributor

SenkJu commented May 25, 2024

I see the library you are using is made by you. What distance function did you implement? It doesn't seem to be one I know.

@Ell1ott
Copy link
Contributor Author

Ell1ott commented May 25, 2024

Well, I chose to write my own library and distance function because I couldn't find any good libraries that did both of the following:

  • Would allow typos (as in inserting a character into the query that doesn't exist in the item). I, for example, found command-score, which does something similar but doesn't allow incorrect characters.
  • Was expecting the user not to have completed the query. When the user searches for a module, they are very likely only to type the first few letters. The few distance functions that allowed typos expected the query string to already be complete. An example of this is the levenshtein

My distance function is slightly inspired by command-score but otherwise created by myself to solve the above-listed problems.

@SenkJu
Copy link
Contributor

SenkJu commented May 26, 2024

Hm, basic distance functions are generally considered a solved problem (see Levenshtein distance). Your implementation appears rather inefficient to me. Consider using something like the Wagner-Fischer algorithm for much higher efficiency.

@Ell1ott
Copy link
Contributor Author

Ell1ott commented May 26, 2024

I ran some tests with the js-levenshtein library and got results where I would prefer the old system (just ranking it depending on if the query existed in the item). Let's take the following example. The user wants to search for 'poison' in a list of words and starts by typing 'poi.' Here, js-levenshtein would recommend the following words as the best for the query 'poi':

  • pool
  • spit
  • war
  • poison
  • dose

Whereas my distance function would recommend these:

  • poison
  • productive
  • proportion
  • pool
  • proof

Regarding the algorithm's performance, js-levenshtein was about twice as performant as my implementation in my tests. It took Levenshtein about 4364 ms to sort a list of 140 random words 60000 times. It took my implementation 7811 ms to do the same. In the end, this efficiency difference won't be noticeable to the end user because they both are extremely efficient, with mine being able to sort the list of 140 items 7600 times per second. And when testing it in the client i did not feel any speed difference to the old system.

If you know any alternatives to Levenshtein that would give better results, I would be happy to take a closer look at them 🙂

@1zun4 1zun4 modified the milestones: 0.6.0, 0.7.0 May 26, 2024
@Ell1ott
Copy link
Contributor Author

Ell1ott commented May 29, 2024

After trying to further improve the performance of my algorithm, I have managed to bring it down to about 6000 ms to sort the list of 140 words 60000 times.

@Ell1ott Ell1ott marked this pull request as draft May 29, 2024 23:11
@Ell1ott Ell1ott marked this pull request as ready for review May 30, 2024 11:44
@Ell1ott
Copy link
Contributor Author

Ell1ott commented Jun 12, 2024

@SenkJu Do you still have concerns or could we maybe merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants