Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence counting of high support rules takes very long #73

Open
kliegr opened this issue Oct 18, 2021 · 3 comments
Open

Confidence counting of high support rules takes very long #73

kliegr opened this issue Oct 18, 2021 · 3 comments

Comments

@kliegr
Copy link

kliegr commented Oct 18, 2021

Confidence counting of a high support rule (support 11.694.826) does not finish within five hours.
The problem is possibly inefficient memory usage since the allocated memory (according to a server-side `top') after five hours is 98.6% of available memory (94 GB) and CPU-use is only around 1% (with unlimited parallelism).

What is also noteworthy is that the reported memory use by RDFRules does not exactly match server-side metering (client shows "Used memory: 74.81 GB / 90.00 GB".

This is not a bug, but possibly a sampling strategy could be used to compute approximate confidence.
taskAndRules.zip

@kliegr
Copy link
Author

kliegr commented Oct 19, 2021

There is some other problem than just high support. Another rule in the same task ( ?b <interacts_with> ?a ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9917529917281246, HeadSize: 11702183, Support: 11605675 has almost identical support (11605675), but for this rule the confidence is computed in several seconds.
The problematic rules are ( ?b <provided_by> ?c ) ^ ( ?a <provided_by> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9993713138822047, HeadSize: 11702183, Support: 11694826 and ( ?a <category> ?c ) ^ ( ?b <category> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9918052042084797, HeadSize: 11702183, Support: 11606286.

@kliegr
Copy link
Author

kliegr commented Oct 19, 2021

This bug is possibly a duplicate of #74

@propi
Copy link
Owner

propi commented Sep 14, 2022

It is the combinatorial explosion. One solution is to have an anytime approach with sampling and approximated results. Now, I added a better debugging of stucked rules and a possibility to interrupt mining or confidence computing tasks. Fortunately, during mining, the hardest rules are mined at the end of the refining rules queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants