Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread Pool Deadlock While Task Wait #1040

Open
COM8 opened this issue Apr 7, 2024 · 1 comment
Open

Thread Pool Deadlock While Task Wait #1040

COM8 opened this issue Apr 7, 2024 · 1 comment

Comments

@COM8
Copy link
Member

COM8 commented Apr 7, 2024

Description

There are potential cases where a cpr::ThreadPool can get stuck in a potential deadlock. They are both caused by a race condition involving task_cond.

It can happen that a thread gets stuck waiting for task_cond to get notified but during this there is no one who can notify him like if the caller invokes cpr::ThreadPool::Wait().

Example/How to Reproduce

Run the currently disabled ThreadPoolTests from within #1035 over and over again. From time to time they will get stuck.

Possible Fix

No response

Where did you get it from?

GitHub (branch e.g. master)

Additional Context/Your Environment

Fedora 38, #1035

@baderouaich
Copy link

The ThreadPool being stuck I encountered, It was never stopping, it seems like the issue has to do with the scope of the status_locks, if you notice the Stop() method:

int ThreadPool::Stop() {
    std::unique_lock status_lock(status_wait_mutex); // <- this will lock the entire scope of Stop(), after notifying, 
// the mutex is still held during the notification which will not give the threads a chance to acquire it and get the notification (in the thread loop)
    
    if (STOP == status) {
      return -1;
    }
    status = STOP;
    status_wait_cond.notify_all();
    task_cond.notify_all();
    
    for (auto& i : threads) {
        if (i.thread->joinable()) {
            i.thread->join();
        }
    }

    threads.clear();
    cur_thread_num = 0;
    idle_thread_num = 0;
    
    return 0;
}

I think it should be:

int ThreadPool::Stop() {
     { 
         // lock only this scope to ensure that the mutex is not held during the notification process, allowing the waiting threads to acquire it immediately after being awakened
          std::unique_lock status_lock(status_wait_mutex);
          if (STOP == status) {
            return -1;
          }
          status = STOP;
     }
    status_wait_cond.notify_all();
    task_cond.notify_all();

 ...
}

I tried that locally and it seems the thread pool is waiting and stopping normally so far, I will test with more use cases and get back to you if there is another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants