-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InternCache - replace synchronized with ReentrantLock
#1251
Conversation
I am fine with this if there's something to suggest performance is better (even if just for newer JVMs). |
@cowtowncoder so you need micro benchmarks to prove this? |
@pjfanning I'd be fine with tests done for similar use case somewhere else; article etc. So no, I don't absolutely require one specifically for this change. Although... if easy enough, it'd of course be great and this case might be easy enough. |
@cowtowncoder I can guarantee that you can write use cases where you can get worse performance (older JDKs - and probably also in newer JDKs if you don't use virtual threads) and better performance (virtual threads in latest JDKs). I think it is fairly pointless. We need to make a decision whether we want to make VirtualThread users happy while imposing a small penalty on all other users - or whether we want to code so that everyone is happy by making this configurable. |
@pjfanning Ok I was hoping for "tends to be no-slower and possibly faster on newer JVMs" kind of general results. Or even just "not meaningfully slower", given it's useful for virtual threads. This particular thing is difficult to wire so I am not really looking forward to doing that. With that... yeah, https://github.com/FasterXML/jackson-benchmarks/ with whatever threads (100?), interning a repeating set of Strings. We don't have to decide on this right today either, can keep this open to discuss etc. |
Wow. This thing ("ReentrantLock vs synchronized") has been around for a while, so many hits from 10 years ago :) |
Hmmh. Not much useful info in articles like: https://medium.com/javarevisited/synchronized-vs-reentrantlock-how-to-choose-cfb5306080e7 which basically suggests Then again, thinking out aloud: since it only matters in high-contention case, and rarely for "normal" (good) usage, there is little downside to doing this. So I am leaning towards merging this change: we do get reports of high lock contention cases from time to time. |
I've done some testing in my own jmh project - https://github.com/pjfanning/jackson-nest-bench/blob/main/src/jmh/java/org/example/jackson/bench/InternCacheBench.java With 5 threads, I'm finding that the InternCache with ReentrantLock calls like in this PR is slower than synchronized. I did make a change to use tryLock instead so that if another thread has the lock, then the current thread skips the clear section. This runs faster. It does mean that under heavy contention that the cache can grow past the max size but that before too long some thread will get the lock and clear down the cache. It would take a pretty extraordinary set of circumstances for the cache size to grow a large amount over the max size. If this was a real worry, we could do something like this (I don't think this extra complexity is warranted).
|
yeah no let's keep things simple. Thank you for checking out performance aspects with jmh benchmark. I think I'm ok with slower operation for low concurrency cases. One ask: could you add an entry in |
@cowtowncoder this is the simpler change that I would like to make to this PR. It can let the cache grow a little bit bigger than the max but it is faster than the existing change in this PR.
|
@pjfanning yes that's fine; max size enforcement can be approximate, and if this is faster for common case, good. Just let me know when this is ready (I guess once converted to non-Draft), will merge. |
@cowtowncoder ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh. And I even forgot that this only applies to rather rare case of flushing full cache.
So even less impactful for most use cases.
Approved.
ReentrantLock
Will also increase default size from 100 to 200, see #1257. Seems rather low, for global cache. Possibly relevant for abnormal cases of no/low symbol-table reuse. |
@cowtowncoder @pjfanning When will we make a release that includes this patch? This is a crucial perf improvement that we are looking forward to! |
@zshao9 Original intent was to add this in 2.18.0; but since it appears safe, I think we could consider backporting into 2.17.1. But I am curious as to whether this really is important: have you measured its impact with load testing? My experience has been that finding this path in code as hot spot tends to indicate different problem altogether -- specifically lack of reuse of |
@pjfanning WDYT? If we think this is good patch (I think so) and safe, maybe it should be backported in 2.17.1, which I hope to release next week. |
v2.17.1 already has a lot of changes. Without an actual description of why this is actually affecting anyone in a serious way, I think we should wait till v2.18.0. |
@pjfanning Yes, valid point. Will wait for actual supporting evidence before considering backporting. |
@cowtowncoder @pjfanning Our main use case is in an Apache Spark executor where 32 threads (on 32 vcores) can be busy parsing large JSON documents with Spark's UDF called Reference: https://issues.apache.org/jira/browse/SPARK-47959 |
@zshao9 Ok but more importantly, is there a way to validate that the change would help? Maybe could test with 2.18.0-SNAPSHOT of jackson-core that has it? (build locally or use Sonatype OSS repo's auto-built). But equally importantly: this could be a sign of lack of reuse of |
@vadim has helped us to disable Jackson Field Name Interning, and the problem disappeared entirely.
I am not familiar with the logic to decide what keys to cache and intern. What if the keys in the Json Object are non-repeated UUID strings? |
@zshao9 Setting is global to Since you hare having performance problems, I would recommend first turning of canonicalization and seeing how that affects performance; and then turning of interning and doing the same. Basically there are 2 main possibilities as to why you are having issues with concurrency:
In first case, your real problem would be solved by reuse or ensuring closing of parsers and NOT changing interning/canonicalization. In second case, you would benefit from changing these settings. I have yet to see a case where locking in |
Hey @zshao9 and @cowtowncoder -- As @JoshRosen mentioned here, it does not seem like the option to disable canonicalization is present in |
@vadim Both options have been around for very long time, without many changes. Difference between canonicalization disabling (no help?) and interning (helps) is interesting -- not sure when/how that would happen. I would then go with disabling interning since that helps. Improvement listed here will go in 2.18.0 but in general is unlikely to be super important compared to other options (IMO). |
If we are going all-in on ReentrantLocks - I would still like to do each change its own PR so that each change gets looked at on its own merits.