Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance while having large array of facts. #324

Open
jyothis-qb opened this issue Dec 21, 2022 · 9 comments
Open

Slow performance while having large array of facts. #324

jyothis-qb opened this issue Dec 21, 2022 · 9 comments

Comments

@jyothis-qb
Copy link

jyothis-qb commented Dec 21, 2022

I have integrated json-rules-engine with a project I am working on and the performance seems much slower than I would expect. I'm using the package to do a simple lookup at another set of facts.

const lookupFacts = [
    {  col1: '', col2: '', col3: ''},
    {  col1: '', col2: '', col3: ''},
    {  col1: '', col2: '', col3: ''}
]

const filterRule = {
  conditions: {
    all: [
      {
        path: "$.col1",
        fact: "fact",
        value: {
          path: "$.col1",
          fact: "lookup"
        },
        operator: "equal"
      },
      {
        path: "$.col2",
        fact: "data",
        value: {
          path: "$.col2",
          fact: "lookup"
        },
        operator: "equal"
      },
      {
        path: "$.col3",
        fact: "data",
        value: {
          path: "$.col3",
          fact: "lookup"
        },
        operator: "equal"
      }
    ],
    event: {
      type: 'filter-event'
    }
  }
}

const engine = new require('json-rules-engine').Engine()

let filteredMatchs = await Promise.all(lookupFacts.map((lookup) => {
  return engine.run({ lookup, data })
    .then(({ events }) => events.length > 0 ? lookup : false)
    .catch((err) => false)
    }))
    .then((values) => values.filter((value) => value))

In my case, the lookupfacts array seems to contain about 80000 entries and it takes around 30000 ms to complete. Whereas doing the same comparison using simple javascript code takes about 10-15 ms only.

I will only be not be having any dynamic data in the flow. Is there a way to improve performance?

Thanks

@jyothis-qb jyothis-qb changed the title Slow performance while having large arrays as fact. Slow performance while having large array of facts. Dec 21, 2022
@mjaniko
Copy link

mjaniko commented Aug 15, 2023

@jyothis-qb Any solution for speeding up Performance ?

@chris-pardy
Copy link
Collaborator

@mjaniko @jyothis-qb I would need to see the actual rules to get a sense for exactly the cause. However I can say that if you're using JSON path expressions to turn an array of 80,000 objects into an array of 80,000 values then it's going to be slow.

Generally you may be better off trying to flatten/ normalize the data so there are no path expressions to evaluate.

@JeffrinCh
Copy link

I'm also facing the same issue but mine is a simple object, but array of object is huge. but the object itself doesn't have a nested path.

was there any resolution on this?

@chris-pardy
Copy link
Collaborator

@JeffrinCh Again I would need to see a specific example to understand exactly but if you're using the path attribute you're going to experience some drop in performance as the results of the facts after applying the path transformation are not cached. @CacheControl there may be an opportunity to do some caching of the job path I can look into.
@JeffrinCh for now one option would be to create dynamic facts instead of using paths. These would benefit from caching and remove the need to parse JSON path expressions.

@chris-pardy
Copy link
Collaborator

@jyothis-qb @JeffrinCh I did some digging and here's my suggestions:
Screenshot 2023-10-12 at 10 03 35 AM
This shows a comparison of runtime across 10,000 executions of calling Almanac.factValue on a fresh Almanac instance so no caching is enabled.
The big take-away is that if you involve the path parameter it will cause a slowdown, that will add up.

In order to speed up your access you could create dynamic facts:

engine.addFact(
  new Fact('factCol1', async (_, almanac) => {
      const f = await almanac.factValue('fact');
      return f.col1;
  });
)

If you're doing lots of path access you could simplify this by creating single facts and using parameters

engine.addFact(
   new Fact('factCol', async ({ col }, almanac => {
     const f = await almanac.factValue('fact');
     return f[`col${col}`];
  })
);

// access your fact with
{
   "fact": "factCol",
   "params": { col: 1 }
   ...
}

This provides a slightly improved performance over using the path value but not quite the same performance benefit of having a specific dynamic fact.

@CacheControl
Copy link
Owner

json-path requires a decent amount of overhead; it's a relatively complex spec. I'm not surprised that using the path feature on a large number of items is causing significant impact.

Its my belief that the underlying json-path library we use (jsonpath-plus) is well optimized for performance, however if there is a performance improvement to be made, it will reside in that library.

I agree with the workaround above of using dynamic facts in place of json-path.

@chris-pardy
Copy link
Collaborator

@CacheControl I also did some digging / profiling of jsonpath-plus and it does seem to be very very optimized. It already caches the results of compiling a path into a function so repeated uses of the same path will not cause re-compilation. The performance is so optimized that even without the caching the behavior is only slightly abnormal.

Suggesting this could probably be closed with Dynamic Facts being the solution.

@JeffrinCh
Copy link

@CacheControl I also did some digging / profiling of jsonpath-plus and it does seem to be very very optimized. It already caches the results of compiling a path into a function so repeated uses of the same path will not cause re-compilation. The performance is so optimized that even without the caching the behavior is only slightly abnormal.

Suggesting this could probably be closed with Dynamic Facts being the solution.

Will try these and check

@iay25
Copy link

iay25 commented Mar 21, 2024

hi @CacheControl @chris-pardy @JeffrinCh , We are using json-rule-engine in our project to get outcome by evaluating around 10k records stored in mongodb.
I am sharing one rule for your reference. We have 10k record like this in our project and evaluating it against facts takes around 10-15 seconds.

{ "conditions": { "all": [ { "fact": "customer_delivery_address", "operator": "equal", "factLabel": "Customer Delivery Address", "value": "GB", "valueSet": [ { "value": "GB", "label": "GB" } ] }, { "fact": "customer_tier", "operator": "equal", "factLabel": "Customer Tier", "value": "gold", "valueSet": [ { "value": "gold", "label": "Gold" } ] }, { "fact": "new_customer", "operator": "isBoolean", "factLabel": "New Customer", "value": true, "valueSet": [ { "value": true, "label": true } ] }, { "fact": "order_amount", "operator": "greaterThan", "factLabel": "Order Amount", "value": 2500, "valueSet": [ { "value": 2500, "label": "2500" } ] }, { "fact": "order_count", "operator": "lessThan", "factLabel": "Order Count", "value": 100, "valueSet": [ { "value": 100, "label": "100" } ] }, { "fact": "order_date", "operator": "isDateGreaterThan", "factLabel": "Order Date", "value": "2024-02-01T05:15:44Z", "valueSet": [ { "value": "2024-02-01T05:15:44Z", "label": "2024-02-01T05:15:44Z" } ] }, { "fact": "order_date", "operator": "isDateLessThan", "factLabel": "Order Date", "value": "2024-03-01T05:10:50Z", "valueSet": [ { "value": "2024-03-01T05:10:50Z", "label": "2024-03-01T05:10:50Z" } ] }, { "fact": "order_state", "operator": "equal", "factLabel": "Order State", "value": "confirmed", "valueSet": [ { "value": "confirmed", "label": "Confirmed" } ] }, { "fact": "payment_state", "operator": "equal", "factLabel": "Payment State", "value": "paid", "valueSet": [ { "value": "paid", "label": "Paid" } ] }, { "fact": "customers", "operator": "equal", "factLabel": "Customers", "value": "[email protected]", "valueSet": [ { "value": "[email protected]", "label": "[email protected]" } ] } ] }, "event": { "type": "categories", "params": { "label": "Category", "value": "Fitness Kit", "key": "8173dfd1-d8a1-417d-ab78-07dfa6799f59", "operator": "is", "source": "resource" } } }

Can you please help me understand why it is taking so much time to evaluate this type of rule. If possible please share solutions for improving performance also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants