You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are potential cases where a cpr::ThreadPool can get stuck in a potential deadlock. They are both caused by a race condition involving task_cond.
It can happen that a thread gets stuck waiting for task_cond to get notified but during this there is no one who can notify him like if the caller invokes cpr::ThreadPool::Wait().
Example/How to Reproduce
Run the currently disabled ThreadPoolTests from within #1035 over and over again. From time to time they will get stuck.
The ThreadPool being stuck I encountered, It was never stopping, it seems like the issue has to do with the scope of the status_locks, if you notice the Stop() method:
intThreadPool::Stop() {
std::unique_lock status_lock(status_wait_mutex); // <- this will lock the entire scope of Stop(), after notifying, // the mutex is still held during the notification which will not give the threads a chance to acquire it and get the notification (in the thread loop)if (STOP == status) {
return -1;
}
status = STOP;
status_wait_cond.notify_all();
task_cond.notify_all();
for (auto& i : threads) {
if (i.thread->joinable()) {
i.thread->join();
}
}
threads.clear();
cur_thread_num = 0;
idle_thread_num = 0;
return0;
}
I think it should be:
intThreadPool::Stop() {
{
// lock only this scope to ensure that the mutex is not held during the notification process, allowing the waiting threads to acquire it immediately after being awakened
std::unique_lock status_lock(status_wait_mutex);
if (STOP == status) {
return -1;
}
status = STOP;
}
status_wait_cond.notify_all();
task_cond.notify_all();
...
}
I tried that locally and it seems the thread pool is waiting and stopping normally so far, I will test with more use cases and get back to you if there is another issue.
Description
There are potential cases where a
cpr::ThreadPool
can get stuck in a potential deadlock. They are both caused by a race condition involvingtask_cond
.It can happen that a thread gets stuck waiting for
task_cond
to get notified but during this there is no one who can notify him like if the caller invokescpr::ThreadPool::Wait()
.Example/How to Reproduce
Run the currently disabled
ThreadPoolTests
from within #1035 over and over again. From time to time they will get stuck.Possible Fix
No response
Where did you get it from?
GitHub (branch e.g. master)
Additional Context/Your Environment
Fedora 38, #1035
The text was updated successfully, but these errors were encountered: