Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): allow JsonType to use a stable JSON.stringify algorithm #3584

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gggdomi
Copy link

@gggdomi gggdomi commented Oct 11, 2022

fixes #3583

We introduced an option on JsonType to allow user to choose between speed or better serialization.

I'm not sure if the option is required, or if stable stringify should simply be the only option because it's more robust ?
I'm not sure either of what you prefer as a default: slightly more speed or less diffs. In doubt, we kept the the old behaviour by default.

Feel free to decide what option is best.

@lgtm-com
Copy link

lgtm-com bot commented Oct 11, 2022

This pull request introduces 1 alert when merging 3341e63 into 9d5d457 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@B4nan
Copy link
Member

B4nan commented Oct 11, 2022

Can you prepare some benchmarks so we know how much slower this solution is? If its fast enough, I dont mind doing this by default, but it would have to be very close to the native implementation.

I am not sure if the type constructor is the right place for this (I do understand this is the "least modified code path"), a global config option would be definitely a good idea, and a property level option to have fine grained control. Because in generall you dont need to work with the type instance, a type: 'json' (or type: t.json) will work the same here, and I dont want to force users to instantiate the types for this.

Maybe something like this?

@Property({ type: 'json', stableStringify: true })
object: { foo: string; bar: number };

That way we could easily control it globally too, I guess if the user wants it stable, they will want to have it stable for all the JSON properties.

I also dont mind having the constructor option in JsonType, I'd like to have it available in other ways too.

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2022

Codecov Report

Base: 99.84% // Head: 99.62% // Decreases project coverage by -0.21% ⚠️

Coverage data is based on head (3341e63) compared to base (b23f015).
Patch coverage: 15.15% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3584      +/-   ##
==========================================
- Coverage   99.84%   99.62%   -0.22%     
==========================================
  Files         211      212       +1     
  Lines       13174    13207      +33     
  Branches     3053     3062       +9     
==========================================
+ Hits        13153    13158       +5     
- Misses         21       44      +23     
- Partials        0        5       +5     
Impacted Files Coverage Δ
packages/core/src/drivers/IDatabaseDriver.ts 100.00% <ø> (ø)
packages/core/src/utils/stableStringify.ts 3.57% <3.57%> (ø)
packages/core/src/types/JsonType.ts 94.44% <80.00%> (-5.56%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rvion
Copy link

rvion commented Oct 12, 2022

Can you prepare some benchmarks so we know how much slower this solution is? If its fast enough, I dont mind doing this by default, but it would have to be very close to the native implementation.

Will do

I am not sure if the type constructor is the right place for this
a global config option would be definitely a good idea, and a property level option to have fine grained control.
That way we could easily control it globally too, I guess if the user wants it stable, they will want to have it stable for all the JSON properties.

yeah, makes sense 👍


to add some thoughs about json fields before doing more:

  • going though strings to check for diffs may not be the best way.

    • a custom deepEqual made for ORM would be better / faster / use less memory when most diffs ends-up without changes
      • lodash and other ones are both too complex (support dates) and wrong (they compare the number of keys, which incorectly deals with undefined {a:1, b:undefined} and {a:1} should be equal under a comparaison diff.
  • sometimes, I think I'd also prefer to use ref equality (===) instead of real diffs, because I know my objects will only be replaces, not modified. this is unsafe, and should totally be behind a "I know what I do and will not complain when it turns out I was wrong in my assumptions", but it is massively faster.

    • should there be an option for json object to also Object.freeze them, especially when === comparaison is active
      (this would catch real-life bugs for specific needs, just had to quickly debug that in my project yesterday, I had one field that needed a stable stringify compare, and an other that only needed a === comparaison)
  • should there a way to specify diff strategy on Types, i.e. ReferenceEquality(or 'StrictEquality') | ==(or 'Equality') | 'ViaDBValue', are there mutually exclusive "axis" here that would deserve a different strategy configuration

  • would it makes sense to allow use to use hints at the compare change time, something like computeChangeSets({ strategy: { BookEntity: {SomeJsonField: 'StricEquality' }} ) or computeChangeSets({ useStricEqualityFor: ['Entity.Field']] }). This is probably not the lowest hanging fruit for perf / memory usage optimisation

@rvion
Copy link

rvion commented Oct 12, 2022

@B4nan the perf impact is massive for my use-case, if used everywhere !

🔴 this should not be merged as is 😨

👉 One big problem is that registering managed entities seems to also serialize JSON values pre-emptively 😨

image

also, in my random current use case, >90%of json fields of my loaded models would be perfectly fine if compared as references, which I don't know how to do with current type abstraction.

(side note: all those little perf problems really seem easilly fixable. I have a ~150seconds script, where most of the time is just MikroORM doing things => early profiling show quick win everywhere, despite the code already beeing very clean; I know you probably have many other things to do, but I'd really like to help optimize mikro-orm further ! Your code is really clean ! :)

@gggdomi
Copy link
Author

gggdomi commented Oct 12, 2022

@B4nan your precisions have been noted. I'll need to take a couple days to dive deeper in the Mikro ORM codebase.
An approximative bench : https://jsbench.me/x1l94p0qvx/1 (TLDR: 60% slower)

In any case, @rvion profiling proves that more work is required. I'll get back to you.

@B4nan
Copy link
Member

B4nan commented Oct 12, 2022

also, in my random current use case, >90%of json fields of my loaded models would be perfectly fine if compared as references, which I don't know how to do with current type abstraction.

Maybe for your use case, the go to way shouldn't be replacing references, but object mutation instead and there you can't just compare references.

👉 One big problem is that registering managed entities seems to also serialize JSON values pre-emptively 😨

Yes, that's how it works, we need to compute the state that is later used for changeset computation. And due to possible differences in the input (based on differences when using the joined loading strategy mostly) we need to serialize the entity instead of using the data that came from database.

I think we should add the compare method to the Type interface, that would allow you tu compare by reference, and it would be explicit.

Indeed the alternative to this could be smarter diffing, so instead of handing custom serialization, we would just make sure the values are same regardless of the order of keys in objects. And if we have this new compare method, we could use it to compare JSON values via compareObjects method we already have (or probably via the generic equals method instead as JSON can store arrays and raw values too, not just objects).

@rvion
Copy link

rvion commented Oct 14, 2022

Maybe for your use case, the go to way shouldn't be replacing references, but object mutation instead and there you can't just compare references.

yeah, especially since it's impossible to hack around field order, since postgres will not preserve ordering on update like node does. But since node preserves the field oder from postgres, I went with Object.assign(entity.jsonField, newJsonField) so JSON.stringify() is similar, since entity.jsonField field order will remain unchanged.


your proposal to allow custom way to compare, without having to serialize first the fields would indeed solve all my problems.

for this specific entity, I'll go with a custom compareObject function, to skip the double json serialization (at fetch and at compare time)

And for for some of my other entities (with hundreds of thouthands of instances loaded in the EM in this very specific example), some of them have json fields known to be frozen and only assignable by reference, I'll go with an even simpler reference comparaison, thus saving seconds of runtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Diffs when json key order changes
4 participants