Confidence counting of high support rules takes very long #73

kliegr · 2021-10-18T12:25:41Z

Confidence counting of a high support rule (support 11.694.826) does not finish within five hours.
The problem is possibly inefficient memory usage since the allocated memory (according to a server-side `top') after five hours is 98.6% of available memory (94 GB) and CPU-use is only around 1% (with unlimited parallelism).

What is also noteworthy is that the reported memory use by RDFRules does not exactly match server-side metering (client shows "Used memory: 74.81 GB / 90.00 GB".

This is not a bug, but possibly a sampling strategy could be used to compute approximate confidence.
taskAndRules.zip

kliegr · 2021-10-19T07:55:26Z

There is some other problem than just high support. Another rule in the same task ( ?b <interacts_with> ?a ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9917529917281246, HeadSize: 11702183, Support: 11605675 has almost identical support (11605675), but for this rule the confidence is computed in several seconds.
The problematic rules are ( ?b <provided_by> ?c ) ^ ( ?a <provided_by> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9993713138822047, HeadSize: 11702183, Support: 11694826 and ( ?a <category> ?c ) ^ ( ?b <category> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9918052042084797, HeadSize: 11702183, Support: 11606286.

kliegr · 2021-10-19T11:33:07Z

This bug is possibly a duplicate of #74

propi · 2022-09-14T12:39:38Z

It is the combinatorial explosion. One solution is to have an anytime approach with sampling and approximated results. Now, I added a better debugging of stucked rules and a possibility to interrupt mining or confidence computing tasks. Fortunately, during mining, the hardest rules are mined at the end of the refining rules queue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confidence counting of high support rules takes very long #73

Confidence counting of high support rules takes very long #73

kliegr commented Oct 18, 2021 •

edited

Loading

kliegr commented Oct 19, 2021 •

edited

Loading

kliegr commented Oct 19, 2021

propi commented Sep 14, 2022

Confidence counting of high support rules takes very long #73

Confidence counting of high support rules takes very long #73

Comments

kliegr commented Oct 18, 2021 • edited Loading

kliegr commented Oct 19, 2021 • edited Loading

kliegr commented Oct 19, 2021

propi commented Sep 14, 2022

kliegr commented Oct 18, 2021 •

edited

Loading

kliegr commented Oct 19, 2021 •

edited

Loading