Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coroutines are leaking #5618

Open
mohsin-devdksa opened this issue Dec 13, 2024 · 5 comments
Open

Coroutines are leaking #5618

mohsin-devdksa opened this issue Dec 13, 2024 · 5 comments

Comments

@mohsin-devdksa
Copy link

Issue:

On our live production server, the Max Coroutine limit of 6,000 reached (within one day only), after which we increased the limit to 60,000 and still it reached after two days despite the fact that we are in pilot phase where only one tester is testing the swoole based websocket (broadcasting) server.

Looks like Coroutines are leaking

1. What did you do? If possible, provide a simple script for reproducing the error.

  • We have a custom process (Call it, Main Custom Process - MCP) attached to the websocket server.
  • From inside this MCP, we create additional custom processes (to fetch third-party data) with Coroutine Context parameter as True
  • In order to fetch third-party data continuously after certain interval, we make use of Swoole Timer
  • Inside the Swoole Timer we make use of go() to interact asynchronously with external sources like database and third-party APIs.
  • In one custom process, we also use Http\Coroutine\Client
  • In use-case of code-reload, we kill the child custom processes of MCP and then also MCP, which causes the MCP to be re-created by Swoole Manager Process which results in re-creation of the new child processes of MCP (Which is how we reload the custom processes)
  • We are also using the signal SIGCHLD and Process::wait() as below, assuming it will also clear the Timers, Event Loop and Coroutines created inside child processes.
Process::signal(SIGCHLD, static function ($sig) {

            while ($ret = Process::wait(true)) {
                /* clean up then event loop will exit */
                Timer::clearAll();
            }
});

And in the onBeforeReload() Event, we pass SIGTERM to the child processes of MCP, and MCP as below:

$pidFiles = glob(__DIR__ . '/process_pids/*.pid');

$mainProcessData = null;

foreach ($pidFiles as $processPidFile) {
    $pid = intval(shell_exec('cat ' . $processPidFile));
    
    // We kill the Main Process manually in the End
    if (strpos($processPidFile, 'MainProcess') !== false) {
        $mainProcessData = [
            'pidFile' => $processPidFile,
            'pid' => $pid,
        ];

        continue;
    }

    // Processes that do not have a timer or loop will exit automatically after completing their tasks.
    // Therefore, some processes might have already terminated before reaching this point
    // So here we need to check first if the process is running by passing signal_no param as 0, as per documentation
    // Doc: https://wiki.swoole.com/en/#/process/process?id=kill
    if (Process::kill($pid, 0)) {
        output('-- Killing Process -----> ' . $processPidFile);
        Process::kill($pid, SIGTERM);
    }

    // Delete the PID File
    unlink($processPidFile);
}

// Kill the (Custom) MainProcess
if (Process::kill($mainProcessData['pid'], 0)) {
    output('Killing Main Process');
    Process::kill($mainProcessData['pid'], SIGTERM);
}

unlink($mainProcessData['pidFile']);

Here is our Repo

2. What did you expect to see?

No Server crash due to Max Coroutine limit exceed with almost no traffic in our production.

3. What did you see instead?

PHP Warning: Swoole\Process::start(): exceed max number of coroutine 60000 in .../swoole-serv/app/Core/Processes/MainProcess.php on line 104 PHP Warning: Swoole\Process::start(): Swoole\Timer->onTimeout handler error in .../swoole-serv/app/Core/Processes/MainProcess.php on line 104
Where MainProcess.php file contains code for creating the child processes of MCP

4. What version of Swoole are you using (show your php --ri swoole)?

swoole

Swoole => enabled
Author => Swoole Team <[email protected]>
Version => 5.1.5
Built => Nov 14 2024 13:43:57
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
sockets => enabled
openssl => OpenSSL 3.0.13 30 Jan 2024
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
pcre => enabled
c-ares => 1.27.0
zlib => 1.3
brotli => E16781312/D16781312
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
mysqlnd => enabled
async_redis => enabled
coroutine_pgsql => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => On => On
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608

5. What is your machine environment used (show your uname -a & php -v & gcc -v) ?

uname -a

Linux 6.8.0-1016-oracle #17-Ubuntu SMP Wed Nov  6 23:01:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

php -v

PHP 8.3.13 (cli) (built: Oct 30 2024 11:28:41) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.3.13, Copyright (c) Zend Technologies
    with Zend OPcache v8.3.13, Copyright (c), by Zend Technologies

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.3.0-6ubuntu2~24.04' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04) 
@NathanFreeman
Copy link
Member

I will take a look later.

@mohsin-devdksa
Copy link
Author

@matyhtf @NathanFreeman

Any update?

We also want to know if the Coroutine Container, that a Timer::tick() creates for each execution of its callback function, is also (implicitly / automatically) cleared (removed from memory) after each execution of its callback function is completed ? without we having to clear the Timer, itself.

Actually, we expect that multiple executions of the callback function of the Timer::tick() does not accumulate the Coroutine Containers.

Your prompt response is awaited.
Thanks in anticipation.

@NathanFreeman
Copy link
Member

You need to show me your timer code. This looks like the coroutines generated after your timer times out are not exiting after communicating with the third-party API, leading to an increasing number of coroutines and exceeding the limit.

@NathanFreeman
Copy link
Member

Since there is currently only one tester conducting the tests, it is unlikely to be a user-induced issue, so the timer code needs to be checked.

@NathanFreeman
Copy link
Member

As long as the coroutine created by the Timer::tick() callback function exits in a timely manner after completing its task, it will not accumulate. Additionally, there is only one coroutine container in the entire program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants