You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AndreFCruz opened this issue
Jul 20, 2022
· 0 comments
Labels
discoveryThis is an exploratory task (high uncertainty)enhancementNew feature or requestlow priorityNice to have but not crucialM effortT-shirt effort weighing: M
According to our perf and valgrind benchmarks, a large percentage of CPU time is spent on synchronization of separate threads during training.
The net outcome of multi-threading is still positive, however when using OMP_NUM_THREADS=4 our code will only consistently use 2 threads, seeming unable to fully parallelize.
The text was updated successfully, but these errors were encountered:
discoveryThis is an exploratory task (high uncertainty)enhancementNew feature or requestlow priorityNice to have but not crucialM effortT-shirt effort weighing: M
According to our
perf
andvalgrind
benchmarks, a large percentage of CPU time is spent on synchronization of separate threads during training.The net outcome of multi-threading is still positive, however when using
OMP_NUM_THREADS=4
our code will only consistently use 2 threads, seeming unable to fully parallelize.The text was updated successfully, but these errors were encountered: