Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical stability improvements for Cosine Distance #101

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

ptondereau
Copy link
Contributor

Pull Request

Changes

  • Added f32::EPSILON threshold check instead of strict zero comparison
  • Added clamp() to ensure cosine stays within [-1, 1] range

Technical Details

The main improvement addresses potential floating-point precision issues:

  • Previously, the code used pnqn != 0.0 which can be unreliable with floating-point arithmetic
  • Now uses pnqn > f32::EPSILON to properly handle near-zero magnitudes
  • Added protection against rounding errors that could push cosine outside [-1, 1]

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

@ptondereau ptondereau force-pushed the secure-float-distance branch from 9de08f6 to fcd4159 Compare November 6, 2024 13:19
@ptondereau ptondereau force-pushed the secure-float-distance branch from fcd4159 to 6a09118 Compare November 6, 2024 13:19
@irevoire irevoire self-requested a review November 6, 2024 13:23
@irevoire irevoire added enhancement New feature or request indexing Everything related to indexing relevancy labels Nov 6, 2024
@irevoire
Copy link
Member

irevoire commented Nov 6, 2024

Thanks for the PR, @ptondereau!

I ran the relevancy benchmarks:

Main

image

This PR

image

To me:

  • The relevancy is pretty much the same; there are some differences here and there, but it’s probably just noise
  • The indexing performances improved a little bit over all datasets
  • Search performances and size on disk are pretty much the same as before

Overall, it’s good, so I’m merging 🦔

@irevoire irevoire merged commit ee3ee47 into meilisearch:main Nov 6, 2024
8 checks passed
@ptondereau ptondereau deleted the secure-float-distance branch November 6, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request indexing Everything related to indexing relevancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants