-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Error in Backfill migration from ES_6.7 to OS_2.17 #1180
Comments
Thanks for reporting @rudney-souza
I suspect that you were getting back a You could quickly try reducing the backfill rate by using the There are some advanced options you can modify for RFS that could let you dial this in if you have especially large documents or expensive indexing operations. Use the Throughput impacting optionsLines 146 to 162 in 3d82d1f
|
ive reduced and the erro with 'Too' has gone, but the error with java heap and not migrating more indexes stayed |
@rudney-souza Could you collect some stats about the documents in the index that you are migrating? Dialing down that number of documents |
of course @peternied , when i dial down im going to let you know here! |
So when running 200 docs per bulk request i got this now (but it seems just to happen one time while running) And im still running backfill @peternied |
@rudney-souza I'll need to do more research for potential remedies, does it look like the backfill is still progressing forward or has it stalled? |
In my live tail it looks like its still progressing, but only with 200 docs per bulk! |
@rudney-souza Some investigation seems to indicate that the |
Its all running @chelma , but i got some of these errors like ERROR o.o.m.RfsMigrateDocuments [leaseWatchingProcessKillerThread-1-1] Terminating RfsMigrateDocuments because the lease has expired for index-x |
@rudney-souza I don't see any specific action for us to take on this issue, please feel free to open up a bug report about a specific issue with a different behavior you'd expect. We are tracking autoscaling MIGRATIONS-1622 which addresses the root cause of 429 retries. |
What is the bug?
When running the backfill migration from ES_6.7 to OS_2.17 i got an error like com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'Too': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false'), and was stuck in the same number of shards even after running all night long (at first were 50 workers and then ive added 130 to see if something changes, but none of remaining shards has finished yet)
How can one reproduce the bug?
running the backfill migration from ES_6.7 to OS_2.17
What is the expected behavior?
Run the migration without errors
What is your host/environment?
ES_6.7 to OS_2.17
Do you have any additional context?
2024-12-05 14:00:51,114 WARN o.o.m.b.c.OpenSearchClient [reactor-http-epoll-2] Unable to process bulk request for success com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'Too': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false') at [Source: REDACTED (
StreamReadFeature.INCLUDE_SOURCE_IN_LOCATIONdisabled); line: 1, column: 4] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2481) ~[jackson-core-2.16.2.jar:2.16.2] at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:762) ~[jackson-core-2.16.2.jar:2.16.2] at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:3042) ~[jackson-core-2.16.2.jar:2.16.2] at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:2085) ~[jackson-core-2.16.2.jar:2.16.2] at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:812) ~[jackson-core-2.16.2.jar:2.16.2] at org.opensearch.migrations.parsing.BulkResponseParser.findSuccessDocs(BulkResponseParser.java:39) ~[RFS-0.1.0-SNAPSHOT.jar:?] at org.opensearch.migrations.bulkload.common.OpenSearchClient$BulkResponse.getSuccessfulDocs(OpenSearchClient.java:531) ~[RFS-0.1.0-SNAPSHOT.jar:?] at org.opensearch.migrations.bulkload.common.OpenSearchClient.lambda$sendBulkRequest$28(OpenSearchClient.java:474) ~[RFS-0.1.0-SNAPSHOT.jar:?] at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:180) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:180) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondComplete(MonoFlatMap.java:245) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:305) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondComplete(MonoFlatMap.java:245) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:305) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext(FluxDoFinally.java:113) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:224) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.Operators$MonoInnerProducerBase.complete(Operators.java:2812) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoSingleOptional$SingleOptionalSubscriber.onNext(MonoSingleOptional.java:101) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxHandle$HandleSubscriber.onNext(FluxHandle.java:129) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:224) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext(FluxDoFinally.java:113) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:194) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.Operators$BaseFluxToMonoOperator.completePossiblyEmpty(Operators.java:2097) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onComplete(MonoCollectList.java:118) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260) [reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:144) [reactor-core-3.6.5.jar:3.6.5] at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:415) [reactor-netty-core-1.1.18.jar:1.1.18] at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:446) [reactor-netty-core-1.1.18.jar:1.1.18] at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:500) [reactor-netty-core-1.1.18.jar:1.1.18] at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:793) [reactor-netty-http-1.1.18.jar:1.1.18] at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:114) [reactor-netty-core-1.1.18.jar:1.1.18] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) [netty-codec-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) [netty-codec-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) [netty-handler-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) [netty-handler-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) [netty-handler-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) [netty-codec-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) [netty-codec-4.1.108.Final.jar:4.1.108.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at org.opensearch.migrations.bulkload.netty.ReadMeteringHandler.channelRead(ReadMeteringHandler.java:26) [RFS-0.1.0-SNAPSHOT.jar:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:801) [netty-transport-classes-epoll-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509) [netty-transport-classes-epoll-4.1.108.Final.jar:4.1.108.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407) [netty-transport-classes-epoll-4.1.108.Final.jar:4.1.108.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.108.Final.jar:4.1.108.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.108.Final.jar:4.1.108.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.108.Final.jar:4.1.108.Final] at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
2024-12-05 14:00:48,025 ERROR r.c.s.Schedulers [parallel-3] Scheduler worker in group main failed with an uncaught exception java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOfRange(Arrays.java:4030) ~[?:?] at java.base/java.lang.StringLatin1.newString(StringLatin1.java:715) ~[?:?] at java.base/java.lang.StringBuilder.toString(StringBuilder.java:452) ~[?:?] at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:498) ~[jackson-core-2.16.2.jar:2.16.2] at [com.fasterxml.jackson.core.io](http://com.fasterxml.jackson.core.io/).SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:99) ~[jackson-core-2.16.2.jar:2.16.2] at org.opensearch.migrations.bulkload.common.BulkDocSection.convertToBulkRequestBody(BulkDocSection.java:72) ~[RFS-0.1.0-SNAPSHOT.jar:?] at org.opensearch.migrations.bulkload.common.OpenSearchClient.lambda$sendBulkRequest$29(OpenSearchClient.java:457) ~[RFS-0.1.0-SNAPSHOT.jar:?] at org.opensearch.migrations.bulkload.common.OpenSearchClient$$Lambda$1253/0x00000008007f5040.get(Unknown Source) ~[?:?] at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:45) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.resubscribe(FluxRetryWhen.java:220) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onNext(FluxRetryWhen.java:274) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxConcatMapNoPrefetch$FluxConcatMapNoPrefetchSubscriber.innerNext(FluxConcatMapNoPrefetch.java:259) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:865) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondComplete(MonoFlatMap.java:245) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:305) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.complete(MonoIgnoreThen.java:294) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onNext(MonoIgnoreThen.java:188) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:237) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:165) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.complete(MonoIgnoreThen.java:294) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onNext(MonoIgnoreThen.java:188) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) ~[reactor-core-3.6.5.jar:3.6.5] at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) ~[reactor-core-3.6.5.jar:3.6.5] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
2024-12-05 14:00:47,618 WARN o.o.m.b.c.OpenSearchClient [reactor-http-epoll-4] After bulk request on index 'vindex', 0 more documents have succeed, 3614 remain
The text was updated successfully, but these errors were encountered: