Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partial ratio #10
base: main
Are you sure you want to change the base?
Add partial ratio #10
Changes from 11 commits
4c3d5f3
83c1aab
f5c87d8
e8900f4
5f8aae9
9d65c4b
e23005a
d98e40a
77fb71b
b1f61bb
89f6094
158e579
ef1e2be
e352abf
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not return None if a score below score_cutoff was passed.
For now this could be:
This will likely need to be changed when making the return type of
partial_ratio_alignment
alignment dependent on the presence of ascore_cutoff
.Please add a test for this as well.
fn issue206
might be a good place. For the other languages I managed to find ways to iterate over functions to reduce the boilerplate for some of these tests. I didn't look into ways to achieve the same in rust so far.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the two strings are swapped for the comparison we have to swap the start and end positions to fix them up. I guess that would come down to something along the lines of:
Alternatively you can construct a new ScoreAlignment and return that.
A test would make sense for this as well. I should add one to the C++ tests too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It assumes len(s1) <= len(s2) not len(s1) < len(s2). I fixed the comment in the Python version as well.
The implementation should be based on the C++ implementation you can fine here: https://github.com/rapidfuzz/rapidfuzz-cpp/blob/c6a3ac87c42ddf52f502dc3ed7001c8c2cefb900/rapidfuzz/fuzz_impl.hpp#L68
The Python implementation only skips windows if they can't exist since a character is unique. The C++ implementation makes use of the fact that a score can only change by a certain distance per shift of the window to skip windows. An example would be:
Here the alignments with the full length are:
Now if we first calculate the distance for the alignments 1 and 3 both of the have a indel distance of 6. That way we know that alignment 2 can at best have an indel distance of 4.
Looking at the C++ implementation I feel like the actual
min_score
calculation in there is incorrect.. I believe instead of:this should be
The current implementation doesn't lead to incorrect results but allows skipping less cells than we really could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improved the calculation of
min_score
in rapidfuzz-cpp to skip more alignmentsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit unsure about this so far. In C++ we directly use random access, but that can only be done because we assume fast random access. In Rust e.g. string is utf8 and so has no fast random access.
I believe we could write this so random access isn't required. However we would still iterate the string multiple times and so maybe for those types it's still faster to just make the copy.
I would leave it like this for now, since
partial_ratio
is never super fast anyway. If we were to change it we should probably benchmark it first.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
things like the batchcomparator and the charset should be passed as argument. This is required so the same implementation can be used from
PartialRatioBatchComparator
.PartialRatioBatchComparator
should get implemented.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are still a couple of other test in https://github.com/rapidfuzz/RapidFuzz/blob/main/tests/test_fuzz.py and https://github.com/rapidfuzz/rapidfuzz-cpp/blob/main/test/tests-fuzz.cpp that we could port over, but I didn't check this in depth so far.