Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless "[Perf]: MatMul reference implementation being executed generated" messages and dramatic inference slowdown upon updating to docker 27.3.1 #1021

Open
lucajovine opened this issue Sep 23, 2024 · 4 comments

Comments

@lucajovine
Copy link

lucajovine commented Sep 23, 2024

Hello,

One of my systems, running Ubuntu 22.04, updated docker to version 27.3.1 (build ce12230)... after which running AlphaFold2 produces a neverending series of messages like the following once it reaches the "Running predict with shape" stage:

I0923 14:01:56.716146 129352767932224 run_docker.py:258] 2024-09-23 12:01:56.715574: W external/xla/xla/service/cpu/onednn_matmul.cc:293] [Perf]: MatMul reference implementation being executed
I0923 14:01:56.741649 129352767932224 run_docker.py:258] 2024-09-23 12:01:56.741240: W external/xla/xla/service/cpu/onednn_matmul.cc:293] [Perf]: MatMul reference implementation being executed
(...)

This issue dramatically impacts runtime, and could be fixed by reverting docker and preventing it from re-updating:

> sudo apt-get install docker-ce=5:26.1.4-1~ubuntu.20.04~focal docker-ce-cli=5:26.1.4-1~ubuntu.20.04~focal containerd.io
> sudo apt-mark hold docker-ce docker-ce-cli
> docker --version
	Docker version 26.1.4, build 5650f9b

However, I thought I should nonetheless report it as other users may have the same issue, and you can most likely fix it in a straightforward way.

Thanks,

Luca

@lucajovine lucajovine changed the title Endless [Perf]: MatMul reference implementation being executed generated upon updating to docker 27.3.1 Endless "[Perf]: MatMul reference implementation being executed generated" messages and dramatic inference slowdown upon updating to docker 27.3.1 Sep 23, 2024
@DrRadan
Copy link

DrRadan commented Oct 12, 2024

Thanks @lucajovine for this solution. I suddenly started having the same issue with sequences that were previously running without a problem once I upgraded all outdated modules in my system in the beginning of this month. Downgrading docker as you suggest fixed it. So I learned the lesson as a new server admin of the importance of pinning (apt-mark hold )! Perhaps there are suggestions of other AF dependencies that should be pinned?

BTW A slight variation from your solution in case it is useful: I am also running on Ubuntu 22.0.4 and I have success running AF on this version of docker:
VERSION_STRING=5:24.0.0-1~ubuntu.22.04~jammy

@lucajovine
Copy link
Author

Hi @DrRadan glad this was useful! I essentially froze my conda environment for AF2, but being docker a system-level tool it got updated anyway, which caused the issue. If you are not already running AF2 in its own environment, that's probably what i would suggest to do to avoid similar issues.

@hofmank0
Copy link

hofmank0 commented Nov 3, 2024

Are you sure that the slow-down was caused by the repeated error message? I had the same problem, but in our case the errors were caused by docker not using the GPU - which obviously slows things down a lot. See my problem report here:
#1035
However, the workaround with the older docker version fixed things for me, too.

@lucajovine
Copy link
Author

Sorry for the late reply @hofmank0: as far as I could tell, docker was using the GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants