Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Needing Advice: Best algo(s) for distance based on "proportion of shared substrings" #35

Open
davidmcnabnz opened this issue Nov 19, 2022 · 0 comments

Comments

@davidmcnabnz
Copy link

Hi there, I'm just getting started with string similarity processing.
In my application, I need to compare short-ish strings of length 25-300 characters, and I need the 'distance between any two' metric to reward things like:

  1. Proportion of each string which is shared substrings, and
  2. Sizes of shared substrings, especially relative to the lengths of the strings being compared

Any suggestions, among the wealth of algorithms and modes supported in this package?

Cheers
David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant