This is part of the Performance study and the single-node-benchmark analysis. The analyses afforded generation of a lot of intermediate data and a web interface, and were moved here for better organization.
Have you ever been to the supermarket and ordered white fish? You may be getting tilapia, flounder, branzino, catfish, cod, haddock, hake, halibut, pollock, sea bass, sole, or whiting. The same is true for cloud CPU architectures. You may know that you are getting some flavor of Intel, but it's unclear if it's Skylake, Icelake, Sandy Bridge, or some other flavor. We did a large performance study in August 2024 that looked across many different environments, clouds, and instance types, and can now reflect on what we found. In the case of finding a potpourri of architectures, we call this the supermarket fish problem.
Under development data processing is underway - a table will be added to each view!
- summary data file for each output file
- table that summarizes each environment (with counts)
- flags and bugs should have some kind of venn diagram that crosses spaces
- sysbench metrics should be plots (not tables)
- cpuinfo -> cpu MHz and bogomips also needs plots (values are all over the place)
- Not sure if this is interesting, but data/azure/cyclecloud/cpu/256/node-0/raw/dmidecode has Core Enabled for each of 32 and 64.
Make some pngs (they render better in react):
for filename in $(find . -name machine.svg)
do
echo $filename
directory=$(dirname $filename)
outpng="$directory/machine.png"
echo inkscape $filename -o $outpng
inkscape $filename -o $outpng
done
To generate data for the gallery:
python 1-generate-gallery.py
Note that I did manually add the index.html/script.js to each directory, and tweaked them (titles, dimensions) for each.
This generates the table (requires pip install pandas
):
python 2-generate-table.py
Again, I copy pasted the same table snippet into the UI that would read the data generated by the script.
Here are some one off result images:
We can see that there is a hidden supermarket problem for AWS and clock speed. When a group doesn't show up (e.g., Google and Azure for many) it's because the values are all the same. I think these are the lines we see in the graph without color - they are histograms for one value.
CPU Size: 32
Max speed: 2000.0 for google-gke-cpu
Max speed: 3725.0 for aws-eks-cpu
Max speed: 3725.0 for aws-parallel-cluster-cpu
Max speed: 3525.0 for azure-cyclecloud-cpu
Max speed: 3525.0 for azure-aks-cpu
CPU Size: 64
Max speed: 2000.0 for google-gke-cpu
Max speed: 3725.0 for aws-eks-cpu
Max speed: 3725.0 for aws-parallel-cluster-cpu
Max speed: 3525.0 for azure-cyclecloud-cpu
Max speed: 3525.0 for azure-aks-cpu
CPU Size: 128
Max speed: 2000.0 for google-gke-cpu
Max speed: 3725.0 for aws-eks-cpu
Max speed: 3525.0 for azure-cyclecloud-cpu
Max speed: 3525.0 for azure-aks-cpu
CPU Size: 256
Max speed: 2000.0 for google-gke-cpu
Max speed: 3725.0 for aws-eks-cpu
Max speed: 3525.0 for azure-cyclecloud-cpu
Max speed: 3525.0 for azure-aks-cpu
CPU Size: 32
Current speed: 2000.0 for google-gke-cpu
Current speed: 2650.0 for aws-eks-cpu
Current speed: 2650.0 for aws-parallel-cluster-cpu
Current speed: 1850.0 for azure-cyclecloud-cpu
Current speed: 1850.0 for azure-aks-cpu
CPU Size: 64
Current speed: 2000.0 for google-gke-cpu
Current speed: 2650.0 for aws-eks-cpu
Current speed: 2650.0 for aws-parallel-cluster-cpu
Current speed: 1850.0 for azure-cyclecloud-cpu
Current speed: 1850.0 for azure-aks-cpu
CPU Size: 128
Current speed: 2000.0 for google-gke-cpu
Current speed: 2650.0 for aws-eks-cpu
Current speed: 1850.0 for azure-cyclecloud-cpu
Current speed: 1850.0 for azure-aks-cpu
CPU Size: 256
Current speed: 2000.0 for google-gke-cpu
Current speed: 2650.0 for aws-eks-cpu
Current speed: 1850.0 for azure-cyclecloud-cpu
Current speed: 1850.0 for azure-aks-cpu
GPU Size: 4
Current speed: 2000.0 for google-gke-gpu
Current speed: 2000.0 for google-compute-engine-gpu
Current speed: 3700.0 for azure-cyclecloud-gpu
Current speed: 3700.0 for azure-aks-gpu
GPU Size: 8
Current speed: 2000.0 for google-gke-gpu
Current speed: 2000.0 for google-compute-engine-gpu
Current speed: 3500.0 for aws-eks-gpu
Current speed: 3700.0 for azure-cyclecloud-gpu
Current speed: 3700.0 for azure-aks-gpu
GPU Size: 16
Current speed: 2000.0 for google-gke-gpu
Current speed: 2000.0 for google-compute-engine-gpu
Current speed: 3500.0 for aws-eks-gpu
Current speed: 3700.0 for azure-cyclecloud-gpu
GPU Size: 32
Current speed: 2000.0 for google-gke-gpu
Current speed: 2000.0 for google-compute-engine-gpu
Current speed: 3700.0 for azure-cyclecloud-gpu
Current speed: 3700.0 2300.0 for azure-aks-gpu
HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (MIT)
LLNL-CODE- 842614