Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InternCache - replace synchronized with ReentrantLock #1251

Merged
merged 3 commits into from
Mar 28, 2024

Conversation

pjfanning
Copy link
Member

If we are going all-in on ReentrantLocks - I would still like to do each change its own PR so that each change gets looked at on its own merits.

@cowtowncoder
Copy link
Member

I am fine with this if there's something to suggest performance is better (even if just for newer JVMs).

@pjfanning
Copy link
Member Author

I am fine with this if there's something to suggest performance is better (even if just for newer JVMs).

@cowtowncoder so you need micro benchmarks to prove this?

@cowtowncoder
Copy link
Member

cowtowncoder commented Mar 27, 2024

@pjfanning I'd be fine with tests done for similar use case somewhere else; article etc.
Just in case there are counter-examples. Or more like sanity check so we aren't just making things worse with no benefit.

So no, I don't absolutely require one specifically for this change.

Although... if easy enough, it'd of course be great and this case might be easy enough.

@pjfanning
Copy link
Member Author

@cowtowncoder I can guarantee that you can write use cases where you can get worse performance (older JDKs - and probably also in newer JDKs if you don't use virtual threads) and better performance (virtual threads in latest JDKs). I think it is fairly pointless. We need to make a decision whether we want to make VirtualThread users happy while imposing a small penalty on all other users - or whether we want to code so that everyone is happy by making this configurable.

@cowtowncoder
Copy link
Member

@pjfanning Ok I was hoping for "tends to be no-slower and possibly faster on newer JVMs" kind of general results. Or even just "not meaningfully slower", given it's useful for virtual threads.

This particular thing is difficult to wire so I am not really looking forward to doing that.

With that... yeah, jmh based test for 2 cases would be great, perhaps for

https://github.com/FasterXML/jackson-benchmarks/

with whatever threads (100?), interning a repeating set of Strings.

We don't have to decide on this right today either, can keep this open to discuss etc.
I only hesitate because this is potential hot spot for performance.

@cowtowncoder
Copy link
Member

Wow. This thing ("ReentrantLock vs synchronized") has been around for a while, so many hits from 10 years ago :)

@pjfanning pjfanning changed the title IntenCache - remove synchronized InternCache - remove synchronized Mar 27, 2024
@cowtowncoder
Copy link
Member

cowtowncoder commented Mar 28, 2024

Hmmh. Not much useful info in articles like:

https://medium.com/javarevisited/synchronized-vs-reentrantlock-how-to-choose-cfb5306080e7

which basically suggests synchronized fine except for heavily-contested cases.
Which may or may not be the case here -- in normal use case InternCache is only called when symbol table lookup fails, which is a sign of poor JsonFactory reuse.
(it is also called for JsonDeserializers by databind -- but similarly deserializers are cached so should be light load).

Then again, thinking out aloud: since it only matters in high-contention case, and rarely for "normal" (good) usage, there is little downside to doing this. So I am leaning towards merging this change: we do get reports of high lock contention cases from time to time.

@pjfanning
Copy link
Member Author

I've done some testing in my own jmh project - https://github.com/pjfanning/jackson-nest-bench/blob/main/src/jmh/java/org/example/jackson/bench/InternCacheBench.java

With 5 threads, I'm finding that the InternCache with ReentrantLock calls like in this PR is slower than synchronized.

I did make a change to use tryLock instead so that if another thread has the lock, then the current thread skips the clear section. This runs faster. It does mean that under heavy contention that the cache can grow past the max size but that before too long some thread will get the lock and clear down the cache.

It would take a pretty extraordinary set of circumstances for the cache size to grow a large amount over the max size. If this was a real worry, we could do something like this (I don't think this extra complexity is warranted).

// GUIDELINE_MAX_ENTRIES = 180
// ABSOLUTE_MAX_ENTRIES = 200
// in new code, we should never get more than ABSOLUTE_MAX_ENTRIES but once we get to GUIDELINE_MAX_ENTRIES
// we do try to clear() but won't wait to get a block unless we've reached the ABSOLUTE_MAX_ENTRIES
        if (size() >= GUIDELINE_MAX_ENTRIES) {
            if (_lock.tryLock()) {
                try {
                    if (size() >= GUIDELINE_MAX_ENTRIES) {
                        clear();
                    }
                } finally {
                    _lock.unlock();
                }
            } else if (size() >= ABSOLUTE_MAX_ENTRIES) {
                _lock.lock();
                try {
                    if (size() >= GUIDELINE_MAX_ENTRIES) {
                        clear();
                    }
                } finally {
                    _lock.unlock();
                }
            }
        }

@cowtowncoder
Copy link
Member

yeah no let's keep things simple. Thank you for checking out performance aspects with jmh benchmark.

I think I'm ok with slower operation for low concurrency cases.

One ask: could you add an entry in release-notes/VERSION-2.x for this change (ok to refer to PR)?

@pjfanning
Copy link
Member Author

pjfanning commented Mar 28, 2024

@cowtowncoder this is the simpler change that I would like to make to this PR. It can let the cache grow a little bit bigger than the max but it is faster than the existing change in this PR.

        if (size() >= MAX_ENTRIES) {
            if (lock.tryLock()) {
                try {
                    if (size() >= MAX_ENTRIES) {
                        clear();
                    }
                } finally {
                    lock.unlock();
                }
            }
        }

@pjfanning pjfanning marked this pull request as draft March 28, 2024 17:53
@cowtowncoder
Copy link
Member

@pjfanning yes that's fine; max size enforcement can be approximate, and if this is faster for common case, good.

Just let me know when this is ready (I guess once converted to non-Draft), will merge.

@pjfanning pjfanning marked this pull request as ready for review March 28, 2024 18:19
@pjfanning
Copy link
Member Author

@cowtowncoder ready for review

Copy link
Member

@cowtowncoder cowtowncoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh. And I even forgot that this only applies to rather rare case of flushing full cache.
So even less impactful for most use cases.

Approved.

@cowtowncoder cowtowncoder merged commit 1fd8cb1 into FasterXML:2.18 Mar 28, 2024
5 checks passed
@cowtowncoder cowtowncoder changed the title InternCache - remove synchronized InternCache - replace synchronized with ReentrantLock Mar 28, 2024
@pjfanning pjfanning deleted the interncache-lock branch March 28, 2024 18:29
@cowtowncoder
Copy link
Member

Will also increase default size from 100 to 200, see #1257. Seems rather low, for global cache. Possibly relevant for abnormal cases of no/low symbol-table reuse.

@zshao9
Copy link

zshao9 commented Apr 23, 2024

@cowtowncoder @pjfanning When will we make a release that includes this patch? This is a crucial perf improvement that we are looking forward to!

@cowtowncoder
Copy link
Member

@zshao9 Original intent was to add this in 2.18.0; but since it appears safe, I think we could consider backporting into 2.17.1.

But I am curious as to whether this really is important: have you measured its impact with load testing? My experience has been that finding this path in code as hot spot tends to indicate different problem altogether -- specifically lack of reuse of JsonFactory and/or ObjectMappers.

@cowtowncoder
Copy link
Member

@pjfanning WDYT? If we think this is good patch (I think so) and safe, maybe it should be backported in 2.17.1, which I hope to release next week.

@pjfanning
Copy link
Member Author

v2.17.1 already has a lot of changes. Without an actual description of why this is actually affecting anyone in a serious way, I think we should wait till v2.18.0.

@cowtowncoder
Copy link
Member

@pjfanning Yes, valid point. Will wait for actual supporting evidence before considering backporting.

@zshao9
Copy link

zshao9 commented Apr 26, 2024

@cowtowncoder @pjfanning Our main use case is in an Apache Spark executor where 32 threads (on 32 vcores) can be busy parsing large JSON documents with Spark's UDF called GET_JSON_OBJECT, which internally uses jackson-core. We have seen that 31 out of 32 threads are waiting for the lock while 1 thread is holding it! I would say this is a huge performance issue for us.

Reference: https://issues.apache.org/jira/browse/SPARK-47959

@cowtowncoder
Copy link
Member

@zshao9 Ok but more importantly, is there a way to validate that the change would help? Maybe could test with 2.18.0-SNAPSHOT of jackson-core that has it? (build locally or use Sonatype OSS repo's auto-built).
Without some form of validation this is speculative fix and as such not worth backporting into patch update (but good enough for minor version as we do not think there are negative effects).

But equally importantly: this could be a sign of lack of reuse of ObjectMapper (most common) -- or directly not reusing JsonFactory. In both cases the problem is that the underlying symbol table is not getting reused (reuse would prevent need to re-intern() Strings once decoded from byte[] or char[]); but since InternCache is global that can get contention.
But reducing that contention is only small part of significant performance gain of properly reusing underlying JsonFactory (or more specifically, symbol table it uses); either directly or indirectly via ObjectMapper.

@zshao9
Copy link

zshao9 commented Apr 28, 2024

@vadim has helped us to disable Jackson Field Name Interning, and the problem disappeared entirely.

But equally importantly: this could be a sign of lack of reuse of ObjectMapper (most common) -- or directly not reusing JsonFactory.

I am not familiar with the logic to decide what keys to cache and intern. What if the keys in the Json Object are non-repeated UUID strings?

@cowtowncoder
Copy link
Member

@zshao9 Setting is global to JsonFactory so you have only decision for whether to canonicalize and/or intern everything or not.

Since you hare having performance problems, I would recommend first turning of canonicalization and seeing how that affects performance; and then turning of interning and doing the same.
Sounds like you have done that.
But I think in addition to disabling interning, you might want to also see about disabling canonicalization; that might be even better option.

Basically there are 2 main possibilities as to why you are having issues with concurrency:

  1. Your symbol tables are not being reused -- either JsonFactory is not reused, or JsonParsers are not being close()d (they are automatically closed if reading the end-of-input; otherwise need to call parser.close().
  2. Number of symbols in documents is huge -- symbol table maximum size is about 6,000 -- and they do not repeat (at all, or not a lot: UUIDs are unique, for example)

In first case, your real problem would be solved by reuse or ensuring closing of parsers and NOT changing interning/canonicalization. In second case, you would benefit from changing these settings.

I have yet to see a case where locking in InternCache is not based on one of (1) or (2): it really is a symptom usually.

@vadim
Copy link

vadim commented Apr 30, 2024

Hey @zshao9 and @cowtowncoder -- As @JoshRosen mentioned here, it does not seem like the option to disable canonicalization is present in jackson-core==2.15.2. I believe his experiments showed no effect when trying to toggle this on 2.15.2, while interning removed the bottleneck faced by @zshao9 . Any thoughts here on versioning?

@cowtowncoder
Copy link
Member

@vadim Both options have been around for very long time, without many changes.
So I don't think Jackson version should matter a lot here.

Difference between canonicalization disabling (no help?) and interning (helps) is interesting -- not sure when/how that would happen.
Either way symbol table use is likely not working optimally for the use case.

I would then go with disabling interning since that helps. Improvement listed here will go in 2.18.0 but in general is unlikely to be super important compared to other options (IMO).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants