-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrix multiplication: Nested parallelism #9
Comments
Changing to nested parallelism. Unfortunately, parallelizing on a single loop doesn't scale well (unless we multiply bigger matrices). Regarding nested parallelism in OpenMP, at first glance it seems quite tricky with a real risk of oversubscription or OpenMP not spawning new threads on the second loop if we use dynamic schedule. |
cc @Laurae2
On benchmark on dual Xeon Gold 6154 vs MKL:
According to the paper
[2] Anatomy of High-Performance Many-Threaded Matrix Multiplication
Smith et al
Parallelism should be done around
jc
(dimensionnc
)Note that
nc
is often 4096 so we might need another distribution scheme.The text was updated successfully, but these errors were encountered: