Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add score threshold to AnalyzerEngine #845

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

Pouyanpi
Copy link
Collaborator

@Pouyanpi Pouyanpi commented Nov 7, 2024

Description

Currently the AnalyzerEngine uses default_score_threshold of 0. This PR adds score_threshold to SensitiveDataDetectionOptions so that the user can set it in the config.yml. The default value is set to 0.2 but it requires further experimentation.

Example configuration:

rails:
  config:
    sensitive_data_detection:
      input:
        score_threshold: 0.4
        entities:
          - PHONE_NUMBER
          - EMAIL_ADDRESS
          - IN_PAN
          - IN_AADHAAR

      output:
        entities:
          - PHONE_NUMBER
          - EMAIL_ADDRESS
          - IN_PAN
          - IN_AADHAAR

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@Pouyanpi Pouyanpi added the enhancement New feature or request label Nov 7, 2024
@Pouyanpi Pouyanpi self-assigned this Nov 7, 2024
@Pouyanpi Pouyanpi added the good first issue Good for newcomers label Nov 7, 2024
@Pouyanpi Pouyanpi force-pushed the fix/presidio-min-threshold branch from 5b77022 to addad9d Compare November 7, 2024 12:07
@Pouyanpi Pouyanpi marked this pull request as ready for review November 10, 2024 10:35
@Pouyanpi Pouyanpi removed the good first issue Good for newcomers label Dec 3, 2024
feat: add score_threshold parameter to _get_analyzer

- Added score_threshold parameter to _get_analyzer function with type hint.
- Included validation to ensure score_threshold is a float between 0 and 1.
- Updated detect_sensitive_data to pass score_threshold to _get_analyzer.
@Pouyanpi Pouyanpi force-pushed the fix/presidio-min-threshold branch from addad9d to 8bb8a60 Compare December 3, 2024 08:15
@Pouyanpi Pouyanpi added this to the v0.12.0 milestone Dec 6, 2024
@Pouyanpi Pouyanpi added status: waiting confirmation Issue is waiting confirmation whether the proposed solution/workaround works. status: in review labels Jan 8, 2025
@Pouyanpi Pouyanpi force-pushed the fix/presidio-min-threshold branch from 9ec17dd to 7e4a3a6 Compare January 9, 2025 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request status: in review status: waiting confirmation Issue is waiting confirmation whether the proposed solution/workaround works.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: PII Filter Incorrectly Masks the Word 'individual' as Sensitive Data
1 participant