Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in simpleclient_vertx4 #861

Open
jyotipatel opened this issue Sep 10, 2023 · 11 comments
Open

Memory leak in simpleclient_vertx4 #861

jyotipatel opened this issue Sep 10, 2023 · 11 comments

Comments

@jyotipatel
Copy link

jyotipatel commented Sep 10, 2023

We are observing continuous increase in heap memory due to MetricsHandler.java. This is causing frequent GC trigger with increase in number of metrics. Attaching the heap allocation screenshot.
image

Following is the code snippet of vertx routing context:
proxyRouter.route("/metrics").handler(new MetricsHandler());

There are around 23000 metrics in this API response.

The heap size allocated for the young generation data is 5GB

@fstab
Copy link
Member

fstab commented Sep 10, 2023

Interesting, thanks for reporting.

We recently had a performance issue with s.charAt(i) here which is fixed in this commit on main. I'm wondering whether that's the same issue.

The reason is that s.charAt(i) used to be implemented very efficiently in Java 8, but got a lot worse with later Java versions.

Which Java version are you using?

@jyotipatel
Copy link
Author

jyotipatel commented Sep 11, 2023

Thanks for your response @fstab. We are using Java 8 itself. The byte[] and char[] is taking all the memory.
And Vertx version is 4.4.5.

The issue is in the following line, which I think tries to convert each metrics to char array before sending the response:

public void write(char[] cbuf, int off, int len) throws IOException {
this.buffer.appendString(new String(cbuf, off, len));
}

@dhoard
Copy link
Collaborator

dhoard commented Sep 11, 2023

@jyotipatel How big is your metrics response in bytes? How many simultaneous scrapes are occurring?

@jyotipatel
Copy link
Author

jyotipatel commented Sep 11, 2023

I just checked the response size, it's around 8TB and the scrapes are occurring every 5 seconds.
What is the ideal size and heap memory to be allocated, do we have any mapping for this?

This size can be reduced to 4TB as I just found out an issue in metrics generation logic. But do we know if at least 4TB can be supported or not?

@dhoard
Copy link
Collaborator

dhoard commented Sep 11, 2023

@jyotipatel I feel this isn't a memory leak, but a side effect of the fact that...

  1. The number of metrics is large
  2. The whole response is buffered in memory before being sent (i.e. it's not streamed.)
  3. Your scrape interval is 5 seconds

@fstab thoughts?

@jyotipatel
Copy link
Author

I can try reducing the size further for point 1 and also increase the scrap interval for point 3. What about the 2nd point? Any way to handle this? @dhoard

@dhoard
Copy link
Collaborator

dhoard commented Sep 11, 2023

For point 2, you could write your own MetricsHandler implementation. (I'm not a Vert.x expert, so I can't provide details.)

Note: The response is buffered to prevent a partial response from being returned if there is an exception during the scrape/response generation.

@jyotipatel
Copy link
Author

I revisited the size, it is not in TB, it's 10 MB of metrics. Sorry for the confusion.

@sr1996-glitch
Copy link

I have a similar observation. Following are the details -

  1. vertx:4.3.8 / prometheus-client:0.16.0 and java8 is used.

  2. The size of heap grows triggering frequent garbage collection (Old collection) literally halting the application. Attached jfr screenshots

Screenshot 2023-09-21 at 4 06 35 PM Screenshot 2023-09-21 at 4 08 10 PM
  1. In addition, I see persistent warning that threads are blocked on this method :
    [vertx-blocked-thread-checker] [WARN] i.v.core.impl.BlockedThreadChecker [?:?] - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 6515 ms, time limit is 2000 ms
    io.vertx.core.VertxException: Thread blocked\n at
    java.lang.StringCoding.encode(StringCoding.java:350)\n at
    java.lang.String.getBytes(String.java:941)\n at
    io.vertx.core.buffer.impl.BufferImpl.append(BufferImpl.java:548)\n at
    io.vertx.core.buffer.impl.BufferImpl.appendString(BufferImpl.java:375)\n at
    io.prometheus.client.vertx.MetricsHandler$BufferWriter.write(MetricsHandler.java:38)\n at
    java.io.Writer.write(Writer.java:192)\n at java.io.Writer.write(Writer.java:157)\n at
    io.prometheus.client.exporter.common.TextFormat.writeOpenMetrics100(TextFormat.java:236)\n at
    io.prometheus.client.exporter.common.TextFormat.writeFormat(TextFormat.java:57)\n at
    io.prometheus.client.vertx.MetricsHandler.handle(MetricsHandler.java:78)\n at
    io.prometheus.client.vertx.MetricsHandler.handle(MetricsHandler.java:26)\n at
    io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1284)\n at
    io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:177)\n at
    io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:141)\n at
    io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:68)\n at
    io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:37)\n at
    io.vertx.core.http.impl.Http1xServerRequestHandler.handle(Http1xServerRequestHandler.java:67)\n at
    io.vertx.core.http.impl.Http1xServerRequestHandler.handle(Http1xServerRequestHandler.java:30)\n at
    io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:55)\n at
    io.vertx.core.impl.DuplicatedContext.emit(DuplicatedContext.java:158)\n at
    io.vertx.core.http.impl.Http1xServerConnection.handleMessage(Http1xServerConnection.java:145)\n at
    io.vertx.core.net.impl.ConnectionBase.read(ConnectionBase.java:157)\n at
    io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:153)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n at
    io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n at
    io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)\n at
    io.netty.handler.codec.http.websocketx.extensions.WebSocketServerExtensionHandler.channelRead(WebSocketServerExtensionHandler.java:99)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n at
    io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n at
    io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)\n at
    io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n at
    io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n at
    io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n at
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n at
    io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n at
    io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n at
    io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n at
    io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)\n at
    io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)\n at
    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n at
    io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n at
    io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n at
    io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n at
    java.lang.Thread.run(Thread.java:750)\n

Problem is not reliably reproducible and usually recovers on application restarts. The metric size is not more than a few MBs. We have recently migrated from prometheus client 0.2.0 to 0.16.0.

@fstab / @dhoard Any thoughts what could be causing this behaviour ? @jyotipatel Were you able to find any workaround ?

@fstab
Copy link
Member

fstab commented Sep 21, 2023

Hey, thank you all for the detailed report. Just FYI: We are currently busy preparing the 1.0.0 release, which is due next week. However, simpleclient_vertx4 will not be part of 1.0.0, we only included a few core modules for now and postponed the 1.0.0 release of other modules like simpleclient_vertx4 for later.

So I might put this on hold for 2 weeks or so until 1.0.0 is done, but I'll get back to this as soon as we have the initial 1.0.0 done and start tackling simpleclient_vertx.

@sr1996-glitch
Copy link

@fstab Thank you for the response. I fully understand that you are occupied preparing 1.0.0 release.

However, Would appreciate if you can have a quick look and suggest any potential workaround or recommendations to reproduce it on demand , which can be tried out in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants