-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocrd_tesserocr processors waste CPU performance because of numpy blas threads #157
Comments
I can only see
Are you saying a function that does not even get called most of the time is consuming CPU time because of some multi-threaded library? How is that? Did you measure or bisect that?
That's what |
@bertsky, it's not the function - it's the import statement which starts the threads which burn the CPU time. |
Did you cross-check that (deactivating the import statement and measuring again)? (I have a hard time believing an unused module/function can burn CPU time.) |
You are right. The function is used for some pages, but even after removing the import statement and the function call there remain 3 threads which use CPU time in my test. One is producing OCR. In GDB I see 6 threads (my CPU supports 6 threads), 5 of them looking like this:
So the problem remains, but my assumption what might be the reason was wrong. |
I now checked thread creation in gdb. Even after removing the numpy code from segment_region.py there still remains a numpy which starts 5 During execution I see 3 threads (always the same PIDs) using the CPU. By attaching gdb to one of them I could confirm that it is a |
@stweil this OpenBLAS issue looks related to what you describe. But it has been fixed 5yrs ago. So I guess it is already deployed in most systems we use today. (I just learned you need to install |
The current code imports numpy although it only uses a single function from that library. Including numpy creates a number of threads for the BLAS algorithms by default. Those threads use a lot of CPU time without doing anything useful.
Setting the environment variable
OMP_THREAD_LIMIT=1
avoids those additional threads.Maybe there exists a better solution which does not require an environment variable, for example removing the numpy requirement.
The text was updated successfully, but these errors were encountered: