NPE at BytesToNameCanonicalizer #145

kimchy · 2014-07-08T21:15:53Z

Heya, updated to 2.4.1, and started to get the following exception:

java.lang.NullPointerException
    at com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer$Bucket.access$000(BytesToNameCanonicalizer.java:1187)
    at com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.findBestBucket(BytesToNameCanonicalizer.java:1049)
    at com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer._addSymbol(BytesToNameCanonicalizer.java:858)
    at com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.addName(BytesToNameCanonicalizer.java:667)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2127)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:1998)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1580)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1527)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:693)
    at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:50)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:270)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:308)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:275)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:308)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:275)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readOrderedMap(AbstractXContentParser.java:258)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.mapOrdered(AbstractXContentParser.java:213)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.mapOrderedAndClose(AbstractXContentParser.java:228)
    at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:119)
    at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:101)
    at org.elasticsearch.index.mapper.DocumentMapperParser.parseCompressed(DocumentMapperParser.java:177)
    at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:400)
    at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:266)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.processMapping(IndicesClusterStateService.java:419)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyMappings(IndicesClusterStateService.java:360)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:179)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:425)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

This seems to happen with very large documents, with many different field names. I tried loading the document and recreating the failure, but it doesn't seem to fail... . I suspect maybe this is caused by many different threads parsing at the same time? I do not know.

For reference, here is a sample document that it fails on: https://dl.dropboxusercontent.com/u/2136051/jackson_npe.txt.

The text was updated successfully, but these errors were encountered:

cowtowncoder · 2014-07-08T21:32:20Z

Can you easily see if this was already in 2.4.0? I would guess it'd be regression from 2.3.3 to 2.4.0 (since there is fair bit of under-the-hood refactoring there); 2.4.0 -> 2.4.1 should have fewer diffs.

kimchy · 2014-07-08T21:38:49Z

@cowtowncoder yea, it happens also in 2.4.0 ....

cowtowncoder · 2014-07-08T21:40:28Z

@kimchy thanks. Unfortunate as it is, at least it makes sense. Code itself does suggest a race condition of some kind... I hope it is not, since single-threaded probs much easier to unit test.

kimchy · 2014-07-08T21:41:36Z

@cowtowncoder aye, smells like a race condition..., can't seem to repro it on my end still....

cowtowncoder · 2014-07-08T21:42:35Z

Hmmh. Actually... code looks bit funky. It'd seem like null check should be needed there, and how 2.3 had it.

cowtowncoder · 2014-07-08T21:55:55Z

Nope. Code in that particular spot not changed.

cowtowncoder · 2014-07-08T22:10:36Z

Looking at diffs, I think the problem is probably in new handleSpillOverflow which gets triggered if collision list grows beyond certain value. Possibly in combination with change to allow reuse even in case where total size may shrink (due to dropped spill lists).

If so, this could still be reproducible with single-thread, but require heavy production of distinct names to trigger the problem.
Alternatively it could require threaded-access problems. But either way, of all changes this is the only thing that looks like it could be the culprit.

kimchy · 2014-07-08T22:12:49Z

@cowtowncoder I see..., interesting..., in any case, if you think of a fix without being to easily repro it in a test, I can easily recreate this in a benchmark type code we have consistently, so I can try it out.

cowtowncoder · 2014-07-08T22:13:45Z

@kimchy Ok. Ideally it'd be reproducible on demand, but since I will try not to touch this code in foreseeable future (once fixed), I am ok with "manual" verification here. :)

cowtowncoder · 2014-07-08T22:19:09Z

Ok, I think I may know what is going on here:

_collList may contain nulls, but before 2.4, only at or after _collEnd
with 2.4.0, it is now possible that one of earlier entries gets reset as null, triggering the problem.

and if so, fix is to add null check in code. I'll try that later on tonight (need to be going now, but will get back to it).

cowtowncoder · 2014-07-09T03:29:03Z

@kimchy I was able to reproduce it locally, with bit of tweaking, although wasn't able to make automated test. However fix worked as expected, and problem does not seem to affect CharsToNameCanonicalizer. Please let me know if you still see this issue.

kimchy · 2014-07-09T10:38:22Z

@cowtowncoder this is perfect, works, thank you!. Apologies in advance for the question, but do you have a rough estimate for a 2.4.2 release?

cowtowncoder · 2014-07-09T14:29:03Z

@kimchy This is actually the first fix since 2.4.1, but given its critical nature it should get out quickly.

One short-term possibility would be to do a micro-release 2.4.1.1, until full 2.4.2 push. This to both give bit more time to address other issues found from 2.4.1, and to plug the immediate problem. It'd require additional entry for jackson-core (other components just using 2.4.1), but could be the best option.

Would this make sense? If so, I can push 2.4.1.1 right away.

kimchy · 2014-07-09T14:36:10Z

@cowtowncoder I think 2.4.1.1 release will confuse users, it would be the first release that has another .v number..., so my vote is just to wait for 2.4.2, and if a few more weeks are needed, then its worth it compared to the confusing that will happen... .

cowtowncoder · 2014-07-09T14:39:07Z

Users need not use 2.4.1.1, and in fact it'd be only for experienced users, or those with specific problems.

But I won't bother with 2.4.1.1 unless someone wants it; I think 2.4.2 will need to be released before end of July anyway. And the option of micro-patch is always available for critical patches, just wanted to mention that in case it made sense for particular issue.

kimchy · 2014-07-09T14:42:30Z

@cowtowncoder make sense, we want to release 1.3 ES in a couple of weeks, if we can get 2.4.2 into this release, it would be awesome.

kimchy · 2014-07-09T14:49:19Z

@cowtowncoder thinking about it, if you are up for 2.4.1.1, it would be great, as it will give jackson more time to bake in our (randomized) testing infra, and potentially flush other things

cowtowncoder · 2014-07-09T14:50:03Z

@kimchy Ok. I'll do 2.4.1.1, more for testing / fallback-option purposes, since doing that is trivially easy.
And then we'll sync 2.4.2 so ES official release can use official patch with the fix.

cowtowncoder · 2014-07-09T15:18:54Z

2.4.1.1 now pushed, should trickle to Maven central within an hour or two.

I also updated 2.4.2 release page:

https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.4.2

and noticed that there was actually already one earlier critical fix; for which there is jackson-databind 2.4.1.1. But it's probably not necessary for ES as it is releated to Object Id handling in databinding.

kimchy · 2014-07-09T15:33:09Z

@cowtowncoder thank you!

cowtowncoder · 2014-07-09T15:38:28Z

@kimchy thank you for reporting this. Also: once issue is dealt with, I am curious if performance difference between 2.3 and 2.4 is measurable. There are some minor improvements that show improvements in some of the tests, but it is hard to know for sure.
One exception is jackson-dataformat-cbor module, which should have significant improvements for reading of small/medium-sized documents. Otherwise improvements are more incremental.

kimchy · 2014-07-09T15:45:55Z

FYI, I know this is a crazy use case, but I had to disable JsonFactory.Feature.FAIL_ON_SYMBOL_HASH_OVERFLOW since it was tripping. 2.4.1.1 fixes the problem, thanks!

kimchy · 2014-07-09T16:10:13Z

@cowtowncoder missed your perf question, it does seem that for large documents, there is performance improvements, though nothing scientific yet, will properly test

cowtowncoder · 2014-07-09T18:17:37Z

@kimchy I'd also be interested in overflow case -- I was bit hesitant wrt this; I know there are exploits out there so there should be some limit (plus performance will go down the drain way before). But at the same time it is theoretically possible to reach limits... and sounds like in practice as well.

So it is also possible that default max overflow limit should be increased, if modest increase would help solve use cases. If not, then feature works.

kimchy mentioned this issue Jul 8, 2014

Upgrade to Jackson 2.4.1.1 elastic/elasticsearch#6789

Closed

cowtowncoder closed this as completed in 5027dc5 Jul 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPE at BytesToNameCanonicalizer #145

NPE at BytesToNameCanonicalizer #145

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

NPE at BytesToNameCanonicalizer #145

NPE at BytesToNameCanonicalizer #145

Comments

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

kimchy commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 8, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014

kimchy commented Jul 9, 2014

kimchy commented Jul 9, 2014

cowtowncoder commented Jul 9, 2014