Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add FIPS compliant hashing algorithm SHA256 #909

Open
1 task done
mdambski opened this issue Dec 17, 2024 · 0 comments
Open
1 task done

feature: Add FIPS compliant hashing algorithm SHA256 #909

mdambski opened this issue Dec 17, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@mdambski
Copy link

mdambski commented Dec 17, 2024

Did you check the docs?

  • I have read all the NeMo-Guardrails docs

Is your feature request related to a problem? Please describe.

The library utilizes MD5 for hashing in embeddings cache and knowledge base.

In embeddings cache:

  • The library utilizes two hash key generators. One uses MD5 and other uses Python built-in hash function, which has utilized the SipHash algorithm since Python 3.4, as introduced in PEP456.

For knowledgebase:

Both algorithms (MD5, SipHash) are not approved for use under US federal requirements (FIPS regulation). To ensure the library is usable by companies that need to remain compliant, adding an additional compliant algorithm like SHA-256 as a configuration option is a small change but would make a difference.

Describe the solution you'd like

In nemoguardrails/embeddings/cache.py, add one more (non-default) implementation of the Key Generator which would allow the usage of SHA-256 as an alternative.

In KnowledgeBaseConfig located in nemoguardrails/rails/llm/config.py, allow configuring the hash algorithm of choice (which would allow choosing SHA-256 instead of only MD5) and use it in the KnowledgeBase class for computing the hash.

Describe alternatives you've considered

Alternative supported algorithms are also a viable option.

In case of KnowledgeBase class the most straightforward solution would be just replacing the call to hashlib.md5 with hashlib.sha256 but this would not be backward compatibile and might cause recomputation of knowledge base. This is why I proposed to add configuration option, even though it is only used for naming of the cache file.

Additional context

No response

@mdambski mdambski added enhancement New feature or request status: needs triage New issues that have not yet been reviewed or categorized. labels Dec 17, 2024
@Pouyanpi Pouyanpi removed the status: needs triage New issues that have not yet been reviewed or categorized. label Jan 6, 2025
@Pouyanpi Pouyanpi self-assigned this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants