- improve calculation of min score inside partial_ratio so it can skip more alignments
- Fixed incorrect score calculation for SIMD implementations of Levenshtein and OSA on 32 bit systems
- split
editops_apply
/opcodes_apply
into*_apply_str
and*_apply_vec
. This avoids the instantiation of std::basic_string for unsupported types.
- the editops implementation didn't properly account for some cells in the Levenshtein matrix. This could lead both to incorrect results and crashes.
- fix tagged version
- fix potentially incorrect results of JaroWinkler when using high prefix weights
- fix assert leading to compilation failures
- fix doxygen warnings
- add banded implementation of LCS / Indel. This improves the runtime from
O((|s1|/64) * |s2|)
toO((score_cutoff/64) * |s2|)
- changed many types in the interface from int64_t to size_t, since they can't be negative.
- fix incorrect transposition calculation in simd implementation of Jaro similarity
- use posix_memalign on android
- use _mm_malloc/_mm_free on macOS if aligned_alloc is unsupported
- fix compilation failure on macOS
- fix wraparound issue in simd implementation of Jaro and Jaro Winkler
- improve performance of simd implementation for LCS and Indel by up to 50%
- improve performance of simd implementation for Jaro and Jaro Winkler
- improve performance of Jaro and Jaro Winkler for long sequences
- fix edge case in new simd implementation of Jaro and Jaro Winkler
- add support for bidirectional iterators
- add experimental simd implementation for Jaro and Jaro Winkler
- added argument
pad
to Hamming distance. This controls whether sequences of different length should be padded or lead to astd::invalid_argument
exception. - improve behaviour when including the project as cmake sub project
- add missing include leading to build failures on gcc 13
- fix handling of
score_cutoff > 1.0
inJaro
andJaroWinkler
- fix division by zero in simd implementation of normalized string metrics, when comparing empty strings
- allow the usage of hamming for different string lengths. Length differences are handled as insertions / deletions
- fix some floating point comparisions in the test suite
- Linters are now disabled in test builds by default and can be enabled using
RAPIDFUZZ_ENABLE_LINTERS
- fix warning about
project_options
when building the test suite withcmake>=3.24
fuzz::partial_ratio
was not always symmetric whenlen(s1) == len(s2)
- fix undefined behavior in experimental SIMD implementaton
- fix broken sse2 support
- fix bug in
Levenshtein.editops
leading to crashes when used withscore_hint
- add
score_hint
argument to cached implementations - add
score_hint
argument to Levenshtein functions
- added
Prefix
/Postfix
similarity
- fixed incorrect score_cutoff handling in
lcs_seq_distance
- added experimental simd support for
ratio
/Levenshtein
/LCSseq
/Indel
- add Jaro and JaroWinkler
- add editops to hamming distance
- strip common affix in osa distance
- add optimal string alignment (OSA) alignment
fuzz::partial_ratio
did not find the optimal alignment in some edge cases
- improve performance of
fuzz::partial_ratio
- fix type mismatch error
- improve performance of Levenshtein distance/editops calculation for long
sequences when providing a
score_cutoff
/score_hint
- improve performance of Levenshtein distance
- improve performance when
score_cutoff = 1
- improve performance for long sequences when
3 < score_cutoff < 32
- improve performance when
- improve performance of Levenshtein editops
- fix incorrect results of partial_ratio for long needles
- added damerau levenshtein implementation
- Not API stable yet, since it will be extended with weights in a future version
- improve performance for banded Levenshtein implementation
- fix banded Levenshtein implementation
- implement Hirschbergs algorithms to reduce memory usage of levenshtein_editops
- fix opcode conversion for empty source sequence
- fix implementation of hamming_normalized_similarity
- fix implementation of CachedLCSseq::distance
- fix integer wraparound in partial_ratio/partial_ratio_alignment
- fix unlimited recursion in CachedLCSseq::similarity
- reduce compiler warnings
- fix undefined behavior in sorted_split incrementing iterator past the end
- fix use after free in editops calculation
- reduce compiler warnings
- added LCSseq (longest common subsequence) implementation
- reduced compiler warnings
- consider float imprecision in score_cutoff
- fix incorrect score_cutoff handling in token_set_ratio and token_ratio
- fix template deduction guides on MSVC