Time to match sets increases exponentially related to the set sizes #106

pitalig · 2020-03-19T13:02:34Z

I believe that because set comparison tries to match all permutations, it is taking an impracticable time (with 10 items it took 17 minutes 😅) to test bigger sets.

Here is a test that I wrote to check that:

(reduce 
  (fn [acc i]
    (let [result (conj acc i)
          expected (conj result (inc i))]
      (println (time (fact ""
                       result
                       => (match expected))))
      (println (str "set size: " (count result)))
      result))
  #{} (range 10))

This was the result (just removed some noise like the facts results):

"Elapsed time: 4.773552 msecs"
set size: 1

"Elapsed time: 5.979492 msecs"
set size: 2

"Elapsed time: 3.775644 msecs"
set size: 3

"Elapsed time: 5.138717 msecs"
set size: 4

"Elapsed time: 16.466419 msecs"
set size: 5

"Elapsed time: 95.003742 msecs"
set size: 6

"Elapsed time: 674.578195 msecs"
set size: 7

"Elapsed time: 6921.199258 msecs"
set size: 8

"Elapsed time: 79986.038842 msecs"
set size: 9

"Elapsed time: 1036299.751686 msecs"
set size: 10

The text was updated successfully, but these errors were encountered:

dchelimsky · 2020-03-19T13:16:30Z

Thanks for this @pitalig. I agree that we should just be using set logic for sets. Are you interested in making a PR, or would you prefer that somebody else fix it? Either is fine.

pitalig · 2020-03-19T13:18:49Z

I will try to do it this week and then I share here what I got.

dchelimsky · 2020-03-19T13:33:35Z

@pitalig FYI, here's the same test using matcher-combinators.standalone (no test framework necessary). Just pointing this out because matcher-combinators.standalone/match is a relatively new addition.

(require '[matcher-combinators.standalone :refer [match]])
(reduce
 (fn [acc i]
   (let [result (conj acc i)
         expected (conj result (inc i))]
     (println (time (match expected result)))
     (println (str "set size: " (count result)))
     result))
 #{}
 (range 10))

philomates · 2020-03-19T15:27:28Z

@sovelten and I explored something along these lines in the past. Not to say it shouldn't be looked into again, but wanted to share our observations:

fernando-nubank · 2020-03-20T21:13:42Z

I don't know if looking at the predicates as equivalent classes will help because not always they will form a partition, for example:

#{odd? even? multiple-of-3?}

thumbnail · 2020-06-15T10:06:08Z

FWIW for our team I wrapped the in-any-order-matcher so it'll timeout after a configurable amount of time.

While this is technically not correct (a match found after 17 minutes is still a valid test), we decided that if we hit this limit we split up the assertion or rewrite the test-case.

The code can be found here: nedap/utils.test#28

fuadsaud · 2024-02-12T15:43:06Z

I ran into this problem recently and ended up implementing the following, which worked alright for my use cases:

(defrecord Sorted [expected]
  matcher-combinators/Matcher

  (-matcher-for [_this] (matcher-combinators/-matcher-for expected))
  (-matcher-for [_this x] (matcher-combinators/-matcher-for expected x))

  (-match [_this actual]
    (let [sorted-expected (try (sort expected)
                               (catch Exception e e))

          sorted-actual (try (sort actual)
                             (catch Exception e e))]
      (cond
        (instance? Exception sorted-expected)
        {::result/type :mismatch
         ::result/value (model/->Mismatch (list 'sorted expected) actual)
         ::result/weight 1}

        (instance? Exception sorted-actual)
        {::result/type :mismatch
         ::result/value (model/->Mismatch (list 'sorted expected) actual)
         ::result/weight 1}

        :else
        (matcher-combinators.core/match sorted-expected sorted-actual))))

  (-base-name [_this] (matcher-combinators/-base-name expected)))

(defn sorted
  "Sorts both the expected and actual values before comparing them as sequences."
  [expected]
  (->Sorted expected))

Perhaps there are more elegant solutions (e.g. a strict set equality matcher), I'd be curious to hear other's thoughts.

dchelimsky added bug enhancement and removed bug labels Mar 19, 2020

philomates mentioned this issue Mar 23, 2020

Align defaults for maps and sequences #108

Closed

philomates mentioned this issue Mar 26, 2024

performance issues (?) simple test takes 13 seconds to execute #221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time to match sets increases exponentially related to the set sizes #106

Time to match sets increases exponentially related to the set sizes #106

pitalig commented Mar 19, 2020 •

edited by dchelimsky

Loading

dchelimsky commented Mar 19, 2020

pitalig commented Mar 19, 2020

dchelimsky commented Mar 19, 2020

philomates commented Mar 19, 2020

fernando-nubank commented Mar 20, 2020

thumbnail commented Jun 15, 2020

fuadsaud commented Feb 12, 2024

Time to match sets increases exponentially related to the set sizes #106

Time to match sets increases exponentially related to the set sizes #106

Comments

pitalig commented Mar 19, 2020 • edited by dchelimsky Loading

dchelimsky commented Mar 19, 2020

pitalig commented Mar 19, 2020

dchelimsky commented Mar 19, 2020

philomates commented Mar 19, 2020

fernando-nubank commented Mar 20, 2020

thumbnail commented Jun 15, 2020

fuadsaud commented Feb 12, 2024

pitalig commented Mar 19, 2020 •

edited by dchelimsky

Loading