Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does the z/sec fall drastically with higher core/thread counts? #18

Open
polarathene opened this issue Nov 14, 2020 · 1 comment
Open

Comments

@polarathene
Copy link

I'm not familiar with LULESH, I've just seen it used as one of many recent benchmarks for the new AMD 5000 series processors.

A 6-core / 12-thread AMD 5600X scores 993 z/sec, while the larger models (with the next being 8-core / 16-thread) fall down to 11 z/sec.

This appears to be the command the benchmarking software is using:

if [ -z \${NUM_CPU_PHYSICAL_CORES_CUBE+x} ]; then NUM_CPU_PHYSICAL_CORES_CUBE=\$NUM_CPU_PHYSICAL_CORES; fi
mpirun --allow-run-as-root -np \$NUM_CPU_PHYSICAL_CORES_CUBE ./lulesh2.0 -s 36 -i 1 > \$LOG_FILE 2>&1

The linked page shows many CPUs with higher core counts seem to have the dramatic drop in performance. Is it linked to hitting a bottleneck when scaling cores? CPU cache or system memory perhaps?

@ikarlin
Copy link
Collaborator

ikarlin commented Nov 14, 2020

There probably is a bug in their code for testing processors with core counts of 8 or higher looking at the results. I'm not sure how the {NUM_CPU_PHYSICAL_CORES_CUBE and other variables are being defined. Likely this is the source of the bug.

Personally, I think they are trying to run multiple MPI ranks and getting something wrong, but this is a hunch.

Probably for single socket tests they should stick to the pure OpenMP code. Otherwise all tests will lead to stranded resources for most processor counts above 8. However, that does not address the bug.

In addition, I would take all the tests with a grain of salt since only running 1 iteration does not produce reliable results. At least 10 iterations should be run and preferable 100 to avoid jitter and smooth out cache warmup effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants