High CPU usage of the resource destructor #178

varsill · 2024-12-09T17:46:26Z

Hello!
Recently I've come across a problem with high CPU usage when processing images with Vix v0.31.1 in Elixir 1.16.3 with OTP 26.
My scenario looks as follows - I am generating a bunch of videos out of sequences of JPEG images in realtime. That is why I need to load images from files and convert them to YUV color space (so that they can later be encoded).
To be more precise, for each video I run the following sequence of operations multiple times per second (corresponding to the video's framerate):

Vix.Image.new_from_binary
Vix.Image.write_to_binary
Vix.Operation.jpegload_buffer
Vix.Operation.linear!
Vix.Operation.bandjoin!
Vix.Operation.xyz
Vix.Operation.composite2
Image.YUV.write_to_binary

Though this task indeed is quite computationally heavy, it's looks to me as if it was using more resources than I expect.
Apart from that I am unable to scale the number of videos beyond 5/6, despite there are still some CPU resources left on my computer.
I did some profiling and noticed that normal scheduler 1 is pushed to its boundaries, with its utilization reaching 100%, in contrast to other normal schedulers, which are almost unused. Below there is a screenshot from the Erlang's observer:

I thought it might have something to do with the fact the the dirty NIFs resources destructor is always run on the normal scheduler 1 (https://github.com/erlang/otp/blob/51548614b588c667a4230c17e2b187474dd70d63/erts/emulator/beam/erl_nif.c#L2841)
so I decided to see what operations are executed on that scheduler. I used sample to do the beam.smp process profiling and it turns out, that the majority of time spent by the normal scheduler 1 thread is indeed spent on run_resource_dtor.
Below there is part of the output of my profiling results:

   36632 Thread_2779870: erts_sched_1
    + 25350 ???  (in <unknown binary>)  [0x103f21a64]
    + ! 25346 ???  (in <unknown binary>)  [0x103e6dee0]
    + ! : 21982 erts_schedule  (in beam.smp) + 8660  [0x102adaf14]
    + ! : | 21953 handle_aux_work  (in beam.smp) + 3652  [0x102adff50]
    + ! : | + 21916 run_resource_dtor  (in beam.smp) + 256  [0x102cb3a04]
    + ! : | + ! 21245 g_object_unref  (in libvips.42.dylib) + 224  [0x35c327b88]
    + ! : | + ! : 17932 ???  (in libvips.42.dylib)  load address 0x35c088000 + 0x146d40  [0x35c1ced40]
    + ! : | + ! : | 17930 g_signal_emit  (in libvips.42.dylib) + 28  [0x35c334b60]
    + ! : | + ! : | + 17778 g_signal_emit_valist  (in libvips.42.dylib) + 64  [0x35c33418c]
    + ! : | + ! : | + ! 17613 ???  (in libvips.42.dylib)  load address 0x35c088000 + 0x2ac708  [0x35c334708]
    + ! : | + ! : | + ! : 17216 ???  (in libvips.42.dylib)  load address 0x35c088000 + 0x2ad35c  [0x35c33535c]
    + ! : | + ! : | + ! : | 17210 g_closure_invoke  (in libvips.42.dylib) + 228  [0x35c323fa8]
    + ! : | + ! : | + ! : | + 17145 ???  (in libvips.42.dylib)  load address 0x35c088000 + 0x143b40  [0x35c1cbb40]
    + ! : | + ! : | + ! : | + ! 17143 g_object_set  (in libvips.42.dylib) + 28  [0x35c329bac]
    + ! : | + ! : | + ! : | + ! : 16589 g_object_set_valist  (in libvips.42.dylib) + 700  [0x35c3295c0]
    + ! : | + ! : | + ! : | + ! : | 16413 g_object_unref  (in libvips.42.dylib) + 224  [0x35c327b88]
    + ! : | + ! : | + ! : | + ! : | + 11518 ???  (in libvips.42.dylib)  load address 0x35c088000 + 0x146d40  [0x35c1ced40]
    + ! : | + ! : | + ! : | + ! : | + ! 11517 g_signal_emit  (in libvips.42.dylib) + 28  [0x35c334b60]
    + ! : | + ! : | + ! : | + ! : | + ! : 11386 g_signal_emit_valist  (in libvips.42.dylib) + 64  [0x35c33418c]
   ...

What is interesting, I see that the part starting with g_signal_emit is repeating itself several thousands of times, apparently decreasing some reference counter.
I've tried reducing the Vips concurrency and after setting it to 1, the amount of CPU used by run_resource_dtor is smaller, but still significant (around 50% of time obtained with Vips concurrency set to 8).

I've several questions concerning my issue:

Why does resource deallocation in Vips takes that much of CPU resources?
Is it expected that "g_signal_emit cycle" is run that many times?
Are there some Vips flags that might be used to do the deallocation "in a bulk"?

The text was updated successfully, but these errors were encountered:

akash-akya · 2024-12-10T06:45:31Z

@varsill thanks for a detailed issue report and analysis.

Deallocation taking lot of CPU is certainly odd. I'll check this as high priority as soon as I get sometime.

Are there some Vips flags that might be used to do the deallocation "in a bulk"?

As of now no.

Is it possible to share a sample code so I can reproduce this easily? I also like to see the context in which these operations are performed (process involvement etc)

Also, a blind suggestion if you want an immediate workaround, try wrapping all the vips operations for each frame in separate process and see if it helps? Something like Task.async(fn -> ... end) |> Task.await()

varsill · 2024-12-10T13:55:23Z

@akash-akya Thank you very much for a quick response!

Deallocation taking lot of CPU is certainly odd. I'll check this as high priority as soon as I get sometime.

Great, thanks in advance!

Also, a blind suggestion if you want an immediate workaround, try wrapping all the vips operations for each frame in separate process and see if it helps? Something like Task.async(fn -> ... end) |> Task.await()

Indeed, this one seems to solve an issue with uneven work distribution among the normal schedulers. When I run :erlang.statistics(:microstate_accounting), I see that the normal schedulers are now sharing the auxilary work. It allows me to scale the number of concurrent streams even further, thanks for that workaround suggestion! Unfortunately, the overall CPU consumption seems to be slightly higher than without that change and I still see that much of work is spent on run_resource_dtor.

Is it possible to share a sample code so I can reproduce this easily?

Sure, I will try to prepare some sample script that reproduces that behaviour

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU usage of the resource destructor #178

High CPU usage of the resource destructor #178

varsill commented Dec 9, 2024 •

edited

Loading

akash-akya commented Dec 10, 2024

varsill commented Dec 10, 2024 •

edited

Loading

High CPU usage of the resource destructor #178

High CPU usage of the resource destructor #178

Comments

varsill commented Dec 9, 2024 • edited Loading

akash-akya commented Dec 10, 2024

varsill commented Dec 10, 2024 • edited Loading

varsill commented Dec 9, 2024 •

edited

Loading

varsill commented Dec 10, 2024 •

edited

Loading