You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The library utilizes MD5 for hashing in embeddings cache and knowledge base.
In embeddings cache:
The library utilizes two hash key generators. One uses MD5 and other uses Python built-in hash function, which has utilized the SipHash algorithm since Python 3.4, as introduced in PEP456.
For knowledgebase:
The library utilizes only MD5 without option to choose an alternative.
Both algorithms (MD5, SipHash) are not approved for use under US federal requirements (FIPS regulation). To ensure the library is usable by companies that need to remain compliant, adding an additional compliant algorithm like SHA-256 as a configuration option is a small change but would make a difference.
Describe the solution you'd like
In nemoguardrails/embeddings/cache.py, add one more (non-default) implementation of the Key Generator which would allow the usage of SHA-256 as an alternative.
In KnowledgeBaseConfig located in nemoguardrails/rails/llm/config.py, allow configuring the hash algorithm of choice (which would allow choosing SHA-256 instead of only MD5) and use it in the KnowledgeBase class for computing the hash.
Describe alternatives you've considered
Alternative supported algorithms are also a viable option.
In case of KnowledgeBase class the most straightforward solution would be just replacing the call to hashlib.md5 with hashlib.sha256 but this would not be backward compatibile and might cause recomputation of knowledge base. This is why I proposed to add configuration option, even though it is only used for naming of the cache file.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Did you check the docs?
Is your feature request related to a problem? Please describe.
The library utilizes MD5 for hashing in embeddings cache and knowledge base.
In embeddings cache:
For knowledgebase:
Both algorithms (MD5, SipHash) are not approved for use under US federal requirements (FIPS regulation). To ensure the library is usable by companies that need to remain compliant, adding an additional compliant algorithm like SHA-256 as a configuration option is a small change but would make a difference.
Describe the solution you'd like
In
nemoguardrails/embeddings/cache.py
, add one more (non-default) implementation of the Key Generator which would allow the usage of SHA-256 as an alternative.In
KnowledgeBaseConfig
located innemoguardrails/rails/llm/config.py
, allow configuring the hash algorithm of choice (which would allow choosing SHA-256 instead of only MD5) and use it in theKnowledgeBase
class for computing the hash.Describe alternatives you've considered
Alternative supported algorithms are also a viable option.
In case of
KnowledgeBase
class the most straightforward solution would be just replacing the call tohashlib.md5
withhashlib.sha256
but this would not be backward compatibile and might cause recomputation of knowledge base. This is why I proposed to add configuration option, even though it is only used for naming of the cache file.Additional context
No response
The text was updated successfully, but these errors were encountered: