We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The text was updated successfully, but these errors were encountered:
限制了128位,如果文本短就用前面的位,文本长就继续加,最长表示到128位。
Sorry, something went wrong.
我测试的结果是,文本会分词并计算每个分词的HASH值,同一位的HASH值会按照0减1加的趋势计算权重(好像权重全部都是1),最后得到的每一位按照正负判断为0还是1
但是每个分词的HASH值都超不过42位,最终的结果就绝对超不过42位啊
@shibing624
好的,所以是觉得42位的效果差,想改为128或者更长的位数吗?
No branches or pull requests
断点打到相似度计算中间发现的,simHash的每一个字符计算,最大位数也就只有42位,向量计算也就只有前42位有效,可能需要更换一下hash算法?
The text was updated successfully, but these errors were encountered: