Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v1.48.0 #4248

Closed
vtjnash opened this issue Dec 12, 2023 · 31 comments
Closed

Release v1.48.0 #4248

vtjnash opened this issue Dec 12, 2023 · 31 comments

Comments

@vtjnash
Copy link
Member

vtjnash commented Dec 12, 2023

To pick up these 24 commits v1.47.0...v1.x

and also probably the last release with Windows 8 support?

desired:

optional:

@bradking
Copy link
Contributor

bradking commented Feb 2, 2024

Desired item #4241 has been superseded by #4292.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 5, 2024

I also just stumbled across comments in the dotnet repo that pointed to documentation that seem to suggest a function was being used improperly for ESRCH in libuv. Seemed like a pretty simple change so I went ahead and made a PR that I think should fix it: #4301. If that PR looks okay to others, should we slip that into this release?

@santigimeno
Copy link
Member

Libuv in Node CI: https://ci.nodejs.org/view/libuv/job/libuv-in-node/231/

There are 2 concerning failures:

Looking into those before making the release.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 5, 2024

Let's revert b9421d7 for this release until there is time to look into that. It sounds like a kernel bug (since there is a C reproducer), and while it only impacts the tests, it seems inconvenient to have that be breaking for now. I am also not certain how nodejs detects this issue, since libuv should have been the only one to install signal handlers.

@santigimeno
Copy link
Member

Let's revert b9421d7

Will do.

I am also not certain how nodejs detects this issue, since libuv should have been the only one to install signal handlers.

Node.js installs signal handlers both using libuv and using sigaction() directly.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 5, 2024

For the exact same signals though? That sounds a bit unreliable (because of exactly this bug), though not entirely unreasonable. The next phase after b9421d7 though might be to implement full signal chaining, so as soon as the libuv handler finishes, it calls the next one in the chain, hiding the existence of the libuv sigaction call from the user (aside from if they or libuv tries to go unregister it later and unregisters libuv instead)

The output I could find for the other test seems fairly worthless. Is this all that nodejs gives for info when a test crashes?

not ok 4091 sequential/test-performance-eventloopdelay
  ---
  duration_ms: 7328.51100
  severity: crashed
  exitcode: -11
  stack: |-
  ...

@santigimeno
Copy link
Member

For the exact same signals though?

Yes, it may happen for SIGINT and SIGTERM. Nodejs installs its own handlers on bootstrap using sigaction() directly, and later on a user can install their own handlers for those signals: in this case it uses the uv_signal_t API.

@santigimeno
Copy link
Member

Revert PR: #4302

@santigimeno
Copy link
Member

Current status: bisecting, as the coredump I could get didn't provide much info.

Click to expand!
Core was generated by `out/Debug/node --expose-internals --expose-gc test/parallel/test-performance-ev'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ffc1a86f420 in ?? ()
[Current thread is 1 (Thread 0x7f142dcc9880 (LWP 1167554))]
(gdb) bt
#0  0x00007ffc1a86f420 in ?? ()
#1  0x0000562c6e716710 in ?? ()
#2  0x0000562c6e6dfbf0 in ?? ()
#3  0x0000562c6e72bd10 in ?? ()
#4  0x0000562c6e72bd00 in ?? ()
#5  0x00007ffc1a86f470 in ?? ()
#6  0x3e54286becaeb000 in ?? ()
#7  0x000000011a86f470 in ?? ()
#8  0x0000562c6e766830 in ?? ()
#9  0x00007ffc1a86f530 in ?? ()
#10 0x0000562c65bd8080 in node::SpinEventLoopInternal (env=0x562c65ebcc58 <node::PerIsolatePlatformData::FlushForegroundTasksInternal()+614>) at ../src/api/embed_helpers.cc:44
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) thread apply all bt

Thread 10 (Thread 0x7f142d4c5640 (LWP 1167733)):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x562c6e70c300) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x562c6e70c300) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x562c6e70c300, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f142dd61a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x562c6e70c2b0, cond=0x562c6e70c2d8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ./nptl/pthread_cond_wait.c:627
#5  0x0000562c676b9f57 in uv_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../deps/uv/src/unix/thread.c:814
#6  0x0000562c65d30db6 in node::LibuvMutexTraits::cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../src/node_mutex.h:175
#7  0x0000562c65d32671 in node::ConditionVariableBase<node::LibuvMutexTraits>::Wait (this=0x562c6e70c2d8, scoped_lock=...) at ../src/node_mutex.h:249
#8  0x0000562c65ec007d in node::TaskQueue<v8::Task>::BlockingPop (this=0x562c6e70c2b0) at ../src/node_platform.cc:605
#9  0x0000562c65ebadf0 in node::(anonymous namespace)::PlatformWorkerThread (data=0x562c6e70cab0) at ../src/node_platform.cc:42
#10 0x00007f142dd62ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007f142ddf4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 9 (Thread 0x7f1427fff640 (LWP 1167735)):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x562c6e70c300) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x562c6e70c300) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x562c6e70c300, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f142dd61a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x562c6e70c2b0, cond=0x562c6e70c2d8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ./nptl/pthread_cond_wait.c:627
#5  0x0000562c676b9f57 in uv_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../deps/uv/src/unix/thread.c:814
#6  0x0000562c65d30db6 in node::LibuvMutexTraits::cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../src/node_mutex.h:175
#7  0x0000562c65d32671 in node::ConditionVariableBase<node::LibuvMutexTraits>::Wait (this=0x562c6e70c2d8, scoped_lock=...) at ../src/node_mutex.h:249
#8  0x0000562c65ec007d in node::TaskQueue<v8::Task>::BlockingPop (this=0x562c6e70c2b0) at ../src/node_platform.cc:605
#9  0x0000562c65ebadf0 in node::(anonymous namespace)::PlatformWorkerThread (data=0x562c6e70ce00) at ../src/node_platform.cc:42
#10 0x00007f142dd62ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007f142ddf4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 8 (Thread 0x7f142ccc4640 (LWP 1167734)):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x562c6e70c300) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x562c6e70c300) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x562c6e70c300, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f142dd61a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x562c6e70c2b0, cond=0x562c6e70c2d8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ./nptl/pthread_cond_wait.c:627
#5  0x0000562c676b9f57 in uv_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../deps/uv/src/unix/thread.c:814
#6  0x0000562c65d30db6 in node::LibuvMutexTraits::cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../src/node_mutex.h:175
#7  0x0000562c65d32671 in node::ConditionVariableBase<node::LibuvMutexTraits>::Wait (this=0x562c6e70c2d8, scoped_lock=...) at ../src/node_mutex.h:249
#8  0x0000562c65ec007d in node::TaskQueue<v8::Task>::BlockingPop (this=0x562c6e70c2b0) at ../src/node_platform.cc:605
#9  0x0000562c65ebadf0 in node::(anonymous namespace)::PlatformWorkerThread (data=0x562c6e70cc60) at ../src/node_platform.cc:42
#10 0x00007f142dd62ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007f142ddf4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 7 (Thread 0x7f142dcc6640 (LWP 1167691)):
#0  0x00007f142ddf3b68 in __GI_epoll_pwait (epfd=11, events=0x7f142dcc2c50, maxevents=1024, timeout=-1, set=0x0) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:40
#1  0x0000562c676c15e7 in uv__io_poll (loop=0x562c6e70c4c8, timeout=-1) at ../deps/uv/src/unix/linux.c:1432
#2  0x0000562c676a4552 in uv_run (loop=0x562c6e70c4c8, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:448
#3  0x0000562c65ebedd6 in node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Run (this=0x562c6e70c3c0) at ../src/node_platform.cc:95
#4  0x0000562c65ebe974 in node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::operator()(void*) const (__closure=0x0, data=0x562c6e70c3c0) at ../src/node_platform.cc:64
#5  0x0000562c65ebe998 in node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*) () at ../src/node_platform.cc:65
#6  0x00007f142dd62ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7  0x00007f142ddf4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 6 (Thread 0x7f142c422640 (LWP 1168198)):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x562c6c78e1a0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x562c6c78e1a0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x562c6c78e1a0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x00007f142dd6abdf in do_futex_wait (sem=sem@entry=0x562c6c78e1a0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x00007f142dd6ac78 in __new_sem_wait_slow64 (sem=0x562c6c78e1a0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000562c676b9c41 in uv__sem_wait (sem=0x562c6c78e1a0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at ../deps/uv/src/unix/thread.c:639
#6  0x0000562c676b9db8 in uv_sem_wait (sem=0x562c6c78e1a0 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at ../deps/uv/src/unix/thread.c:695
#7  0x0000562c65feb3dc in node::inspector::(anonymous namespace)::StartIoThreadMain (unused=0x0) at ../src/inspector_agent.cc:82
#8  0x00007f142dd62ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#9  0x00007f142ddf4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 5 (Thread 0x7f14277fe640 (LWP 1167736)):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x562c6e70c304) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x562c6e70c304) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x562c6e70c304, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f142dd61a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x562c6e70c2b0, cond=0x562c6e70c2d8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ./nptl/pthread_cond_wait.c:627
#5  0x0000562c676b9f57 in uv_cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../deps/uv/src/unix/thread.c:814
#6  0x0000562c65d30db6 in node::LibuvMutexTraits::cond_wait (cond=0x562c6e70c2d8, mutex=0x562c6e70c2b0) at ../src/node_mutex.h:175
#7  0x0000562c65d32671 in node::ConditionVariableBase<node::LibuvMutexTraits>::Wait (this=0x562c6e70c2d8, scoped_lock=...) at ../src/node_mutex.h:249
#8  0x0000562c65ec007d in node::TaskQueue<v8::Task>::BlockingPop (this=0x562c6e70c2b0) at ../src/node_platform.cc:605
#9  0x0000562c65ebadf0 in node::(anonymous namespace)::PlatformWorkerThread (data=0x562c6e70cf70) at ../src/node_platform.cc:42
#10 0x00007f142dd62ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007f142ddf4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 4 (LWP 1167783):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 3 (LWP 1167732):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 2 (LWP 1167690):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 1 (Thread 0x7f142dcc9880 (LWP 1167554)):
#0  0x00007ffc1a86f420 in ?? ()
#1  0x0000562c6e716710 in ?? ()
#2  0x0000562c6e6dfbf0 in ?? ()
#3  0x0000562c6e72bd10 in ?? ()
#4  0x0000562c6e72bd00 in ?? ()
#5  0x00007ffc1a86f470 in ?? ()
#6  0x3e54286becaeb000 in ?? ()
#7  0x000000011a86f470 in ?? ()
#8  0x0000562c6e766830 in ?? ()
#9  0x00007ffc1a86f530 in ?? ()
#10 0x0000562c65bd8080 in node::SpinEventLoopInternal (env=0x562c65ebcc58 <node::PerIsolatePlatformData::FlushForegroundTasksInternal()+614>) at ../src/api/embed_helpers.cc:44
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@bnoordhuis
Copy link
Member

bnoordhuis commented Feb 6, 2024

Frame 10 of thread 1 is a call to platform->DrainTasks() but it goes completely off the rails. I would've expected to see NodePlatform::DrainTasks() in the next frame.

edit: does valgrind report anything?

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

It looks more like an unwinder bug to me (probably gdb to v8 integration is disabled so the unwinding cannot occur correctly). I suspect none of those frames are right, given:

#6  0x3e54286becaeb000 in ?? ()

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

Is it possible to run nodejs with the v8 JIT configured with the equivalent of -fno-omit-frame-pointer? It seems that many distros are now starting to set that option by default, since it provides mostly negligible performance improvements but breaks debugging and slows down perf tools quite substantially (https://chromium.googlesource.com/v8/v8.git/+/branch-heads/5.9/Makefile#232 maybe?)

@bnoordhuis
Copy link
Member

bnoordhuis commented Feb 6, 2024

It looks more like an unwinder bug to me (probably gdb to v8 integration is disabled so the unwinding cannot occur correctly)

That would've been my guess too but I don't think that code path calls into JS code. Disassembly of the top most frame will probably tell though; V8-generated machine code has a very specific vibe to it.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

Right. The stack frames past 1 or 2 are probably wrong, so the eventual printing of node::SpinEventLoopInternal is just stack garbage and unrelated to the actual trace. Is it possible to use rr on this machine, or is this only reproducible on some cloud host that blocks installing that particular tool? I tried making a local build to reproduce it, but the test ran okay.

@bnoordhuis
Copy link
Member

Is it possible to run nodejs with the v8 JIT configured with the equivalent of -fno-omit-frame-pointer?

No. V8 uses a completely different calling convention, it's not due to lack of frame pointers. There is gdb integration (the --gdb switch to configure) but it has a tendency to bitrot so YMMV.

@santigimeno
Copy link
Member

santigimeno commented Feb 6, 2024

In order to reproduce it locally on my linux x64 box I had to really push it:

# Debug build
$ ./configure --debug && make -j12

# Run 300 instances of the test in parallel a few times until some crashes happened
$ for i in $(seq 1 300); do out/Debug/node --expose-internals --expose-gc test/sequential/test-performance-eventloopdelay.js & done

Also, I'm at the moment building node in one of the ci instances where the issue seems to happen consistently.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

Okay, I see segfaults with that too. That makes it really trival to throw rr in front of that and then replay the failing trace

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

It looks like the stack gets pretty badly smashed somewhere around, since we start the function call with a frame that looks like this

(rr) bt
#0  0x0000556ea22bc87c in node::NodePlatform::DrainTasks (this=0x556eaa5aeb40, isolate=0x556eaa5e4370) at ../src/node_platform.cc:462
#1  0x0000556ea1fd8080 in node::SpinEventLoopInternal (env=0x556eaa648260) at ../src/api/embed_helpers.cc:44
#2  0x0000556ea2257996 in node::NodeMainInstance::Run (this=0x7fff3cebff70, exit_code=0x7fff3cebfec4, env=0x556eaa648260) at ../src/node_main_instance.cc:124
#3  0x0000556ea2257824 in node::NodeMainInstance::Run (this=0x7fff3cebff70) at ../src/node_main_instance.cc:100
#4  0x0000556ea211e8c2 in node::StartInternal (argc=4, argv=0x556eaa5698a0) at ../src/node.cc:1434
#5  0x0000556ea211e994 in node::Start (argc=4, argv=0x7fff3cec01a8) at ../src/node.cc:1441
#6  0x0000556ea403a026 in main (argc=4, argv=0x7fff3cec01a8) at ../src/node_main.cc:97

and then we return from that DrainTasks into no-mans-land

rr says the stack corruption happened here:

(rr) bt
#0  uv__queue_remove (q=0x556eaa7c9bc8) at ../deps/uv/src/queue.h:87
#1  0x0000556ea3aab497 in uv_timer_stop (handle=0x556eaa7c9b60) at ../deps/uv/src/timer.c:110
#2  0x0000556ea3aab76b in uv__timer_close (handle=0x556eaa7c9b60) at ../deps/uv/src/timer.c:201
#3  0x0000556ea3ab0bf2 in uv_close (handle=0x556eaa7c9b60, close_cb=0x556ea20ded0e <node::HandleWrap::OnClose(uv_handle_s*)>) at ../deps/uv/src/unix/core.c:187
#4  0x0000556ea20de98d in node::HandleWrap::Close (this=0x556eaa7c9ad0, close_callback=...) at ../src/handle_wrap.cc:76
#5  0x0000556ea20deaf2 in node::HandleWrap::OnGCCollect (this=0x556eaa7c9ad0) at ../src/handle_wrap.cc:95
#6  0x0000556ea20145eb in operator() (__closure=0x0, data=...) at ../src/base_object.cc:69
#7  0x0000556ea201460f in _FUN () at ../src/base_object.cc:70
#8  0x0000556ea29a4ee8 in v8::internal::GlobalHandles::PendingPhantomCallback::Invoke (type=v8::internal::GlobalHandles::PendingPhantomCallback::kFirstPass, 
    isolate=<optimized out>, this=0x556eaa7b44b0) at ../deps/v8/src/handles/global-handles.cc:866
#9  v8::internal::GlobalHandles::InvokeFirstPassWeakCallbacks (this=0x556eaa5fa6b0) at ../deps/v8/src/handles/global-handles.cc:840
#10 0x0000556ea2ae2809 in v8::internal::Heap::PerformGarbageCollection (this=0x556eaa5f1d20, collector=v8::internal::GarbageCollector::SCAVENGER, gc_reason=<optimized out>, 
    collector_reason=<optimized out>) at ../deps/v8/src/execution/isolate.h:1326
#11 0x0000556ea2ae3a8d in operator() (__closure=__closure@entry=0x7fff3cebfaa0) at ../deps/v8/src/heap/heap.cc:1871
#12 0x0000556ea2ae6d14 in heap::base::Stack::SetMarkerAndCallbackImpl<v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags)::<lambda()> >(heap::base::Stack *, void *, const void *) (stack=0x556eaa5f4888, argument=0x7fff3cebfaa0, stack_end=0x7fff3cebfa20)
    at ../deps/v8/src/heap/base/stack.h:95
#13 0x0000556ea3a22467 in PushAllRegistersAndIterateStack ()
#14 0x0000556ea2ae49d6 in heap::base::Stack::SetMarkerIfNeededAndCallback<v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags)::<lambda()> > (callback=..., this=<optimized out>) at ../deps/v8/src/heap/base/stack.h:60
#15 v8::internal::Heap::CollectGarbage (this=this@entry=0x556eaa5f1d20, space=space@entry=v8::internal::NEW_SPACE, 
    gc_reason=gc_reason@entry=v8::internal::GarbageCollectionReason::kTask, gc_callback_flags=gc_callback_flags@entry=v8::kNoGCCallbackFlags)
    at ../deps/v8/src/heap/heap.cc:1835
#16 0x0000556ea2b6412f in v8::internal::MinorGCJob::Task::RunInternal (this=0x556eaa7c9f80) at ../deps/v8/src/heap/minor-gc-job.cc:102
#17 0x0000556ea22bc5a8 in node::PerIsolatePlatformData::RunForegroundTask (this=0x556eaa5f9980, task=std::unique_ptr<v8::Task> = {...}) at ../src/node_platform.cc:425
#18 0x0000556ea22bcc13 in node::PerIsolatePlatformData::FlushForegroundTasksInternal (this=0x556eaa5f9980) at ../src/node_platform.cc:499
#19 0x0000556ea22bc881 in node::NodePlatform::DrainTasks (this=0x556eaa684600, isolate=0x556eaa684608) at ../src/node_platform.cc:462
Backtrace stopped: frame did not save the PC

@santigimeno
Copy link
Member

That points to commit 51a22f6, which btw is the bisect last step I'm currently running

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

It looks like uv__queue_remove might get called multiple times (from repeated uv_timer_stop calls) after that commit, despite not being idempotent (it removes itself from the queue, but doesn't reinitialize its own queue)?

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

Looks like #4250 (review) might have had the correct code, but it didn't get copied into the PR correctly?

@santigimeno
Copy link
Member

So, are we talking about smthg like this?

diff --git a/src/timer.c b/src/timer.c
index b57d64277..7847a15c2 100644
--- a/src/timer.c
+++ b/src/timer.c
@@ -104,10 +104,12 @@ static void timer_stop(uv_timer_t* handle) {
 
 
 int uv_timer_stop(uv_timer_t* handle) {
-  if (uv__is_active(handle))
+  if (uv__is_active(handle)) {
     timer_stop(handle);
-  else
+  } else {
     uv__queue_remove(&handle->node.queue);
+    uv__queue_init(&handle->node.queue);
+  }
   return 0;
 }

I've tested locally and I couldn't reproduce the crash.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 7, 2024

yes, that looks like what I was thinking, though it doesn't need to be conditional. I can also confirm that running Ben's while loop a few times doesn't crash anymore with that change.

santigimeno added a commit to santigimeno/libuv that referenced this issue Feb 7, 2024
As there were instances where this didn't happen and could cause memory
corruption issues.

Refs: libuv#4248
@santigimeno
Copy link
Member

I opened #4304. PTAL

santigimeno added a commit that referenced this issue Feb 7, 2024
As there were instances where this didn't happen and could cause memory
corruption issues.

Refs: #4248
@santigimeno
Copy link
Member

@santigimeno
Copy link
Member

This is done. Sorry I messed up with the last commits but the GH Security Advisory feature was broken and didn't allowed me to merge the advisory with branch protections (even though it said it would bypass them), so I manually merged it and after the fact I figured out that I had to disable the protections manually in order to merge the advisory.

santigimeno added a commit that referenced this issue Feb 7, 2024
@vtjnash
Copy link
Member Author

vtjnash commented Feb 7, 2024

I think it might because we don't allow admins to bypass branch protections? I usually briefly disable it around a release so that the script will work correctly to push the release and tag it

@santigimeno
Copy link
Member

Yes I understand, but the SA UI had a tick to bypass the branch protections. It didn't work

@jpalus
Copy link

jpalus commented Feb 7, 2024

Just a minor nit that release name for 1.48.0 still mentions 1.47.0:
image

@santigimeno
Copy link
Member

Just a minor nit that release name for 1.48.0 still mentions 1.47.0

Fixed. Thanks!

@richardlau
Copy link
Contributor

Just a minor nit that release name for 1.48.0 still mentions 1.47.0: image

FYI I just edited the date in the release name to "2024.02.07".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants