forked from RRZE-HPC/likwid
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add rocm GPU topology * First rocmon implementation Basic implementation for montioring AMD GPUs with rocprofiler * Move rocm call from addEventSet to setupCounters * Include short_name in topology * Implement more rocmon functions This version does still not produce consistent results? * Implement functions for rocmon marker api * Add macros for rocmon marker api * Add test for rocmon marker api * Add ROCm SMI Backend to Rocmon ROCm SMI provides additional counters and information to rocprofiler. * Fix cut off event names * Add temporary build instructions * Fix markerfile format documentation * Fix device id device index mixup * Fix same variable name for rocmon and nvmon topology Change rocmon topology variable name to avoid conflicts with nvmon and allow builds with both nvmon and rocmon. * Integrate nvml library into nvmon The NVIDIA Management Library (NVML) allows measuring of more statistics like power usage. In addition to the existing events, events from the NVML library can now be measured with LIKWID. * Fix return types nvml_getResult and nvml_getLastResult incorrectly returned int instead of double. * Fix gpu markers for nvml Markerfile now contains average value for nvml events. * Refactor result update Put updating of result struct after measurement in dedicated function. * Fix wrong function call Called nvml_getResult instead of nvml_getLastResult in nvmon_getLastResult. * Simplify SMI event wrappers * Fix filter for rocmon in Makefile * Add timeline mode for GPUs using AppDaemon * Fix appDaemon linker errors * Add last value to output file * Fix marker API for SMI events Return accumulated values for ROCm SMI events, not accumulated difference. * Fix disparities between rocmon marker Let user calculate average * Adjust tests for benchmarking * Fix dllink issues in rocmon_init * Add macros for ROCM Debugging * Add function to resolve GPUstr for ROCM * Update ROCMon and ROCMon marker * Changes to the likwid header * Add ROCmon to likwid-perfctr * Add example groups * Rename symbol HSA_VEN_AMD_AQLPROFILE_LEGACY_PM4_PACKET_SIZE to avoid collision. * adjusted for ROCm 5.4 * fixed AMD multi gpu issues * Change rocm metrics.xml path to new directory spec * Include likwid libs in LD_LIBRARY_PATH at runtime * Add more groups for AMD GPUs * Fix AMD rocm performance group metrics * Adjust ROCm library path to new path structure * Enable appDaemon to print timeline measurements to stderr * Add GPU timeline support to likwid-perfctr * Leave event_string_list empty if cpu perf group is not defined * Fix typo in appDaemon environment variable * Handle permission error for Rocm Marker API file * Fix library environments for Rocm * Modify conditions to allow for Rocm timeline support * Fix make config to allow App Daemon Build for Rocm * appDaemon: search for libraries in build directory * Delete previous Smi events in rocmon_setupCounters * Add power group to AMD GPU * Set previous numSmiEvents to 0 in rocmon_setupCounters * Fix amd_gpu POWER group * Add likwid library to library path for nvidia GPUs * Fix perfworks API for cuda versions >=11.2 * likwid-perfctr: Fix list events and counters for Nvidia GPUs * Add backwards compatibility for ROCm metrics path * access-daemon Makefile: Only include liblikwid in appDaemon target * likwid-perfctr: remove version number from rocprofiler64 library * rocmon: implement workaround for rocprofiler_iterate_info bug in ROCm 5.4.0 * likwid-perfctr: Use INSTALLED_LIBPREFIX for library path * make more than 1 metric usable in timeline rocm * fix wrong time readings in timeline mode * update doxygen for AMD GPU support * fix import order for PciDeviceId errors while compiling --------- Co-authored-by: Marcel Marquardt <[email protected]> Co-authored-by: Karlo Kraljic <[email protected]> Co-authored-by: Thomas Roehl <[email protected]> Co-authored-by: Sebastian Schnorbus <[email protected]> Co-authored-by: Thomas Gruber <[email protected]>
- Loading branch information
1 parent
5945ca1
commit 696e1fa
Showing
44 changed files
with
14,527 additions
and
6,124 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
## Build & Install | ||
|
||
```bash | ||
export ROCM_HOME=/opt/rocm | ||
make | ||
make install | ||
``` | ||
|
||
## Test | ||
|
||
Build | ||
|
||
```bash | ||
cd test | ||
# make clean | ||
make test-topology-gpu-rocm | ||
make test-rocmon-triad | ||
make test-rocmon-triad-marker | ||
``` | ||
|
||
Run | ||
|
||
```bash | ||
export LD_LIBRARY_PATH=/home/users/kraljic/likwid-rocmon/install/lib:/opt/rocm/hip/lib:/opt/rocm/hsa/lib:/opt/rocm/rocprofiler/lib:$LD_LIBRARY_PATH | ||
export ROCP_METRICS=/opt/rocm/rocprofiler/lib/metrics.xml # for rocmon test | ||
export HSA_TOOLS_LIB=librocprofiler64.so.1 # allows rocmon to intercept hsa commands | ||
./gpu-test-topology-gpu-rocm | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
SHORT GDS Instructions | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_INSTS_GDS | ||
ROCM1 ROCP_SQ_WAVES | ||
|
||
METRICS | ||
GPU GDS rw insts per work-item ROCM0/ROCM1 | ||
|
||
LONG | ||
-- | ||
The average number of GDS read or GDS write instructions executed | ||
per work item (affected by flow control). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
SHORT Memory utilization | ||
|
||
EVENTSET | ||
ROCM0 ROCP_TA_TA_BUSY | ||
ROCM1 ROCP_GRBM_GUI_ACTIVE | ||
ROCM2 ROCP_SE_NUM | ||
|
||
METRICS | ||
GPU memory utilization 100*max(ROCM0,16)/ROCM1/ROCM2 | ||
|
||
LONG | ||
-- | ||
The percentage of GPUTime the memory unit is active. The result includes | ||
the stall time (MemUnitStalled). This is measured with all extra fetches | ||
and writes and any cache or memory effects taken into account. | ||
Value range: 0% to 100% (fetch-bound). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
SHORT PCI Transfers | ||
|
||
EVENTSET | ||
ROCM0 RSMI_PCI_THROUGHPUT_SENT | ||
ROCM1 RSMI_PCI_THROUGHPUT_RECEIVED | ||
|
||
|
||
METRICS | ||
Runtime time | ||
PCI sent ROCM0 | ||
PCI received ROCM1 | ||
PCI send bandwidth 1E-6*ROCM0/time | ||
PCI recv bandwidth 1E-6*ROCM1/time | ||
|
||
LONG | ||
-- | ||
Currently not usable since the RSMI_PCI_THROUGHPUT_* events require | ||
one second per call, so 2 seconds for both of them. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
SHORT Power, temperature and voltage | ||
|
||
EVENTSET | ||
ROCM0 RSMI_POWER_AVE[0] | ||
ROCM1 RSMI_TEMP_EDGE | ||
ROCM2 RSMI_VOLT_VDDGFX | ||
|
||
|
||
METRICS | ||
Power average 1E-6*ROCM0 | ||
Edge temperature 1E-3*ROCM1 | ||
Voltage 1E-3*ROCM2 | ||
|
||
LONG | ||
-- | ||
Gets the current average power consumption in watts, the | ||
temperature in celsius and the voltage in volts. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
SHORT SALU Instructions | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_INSTS_SALU | ||
ROCM1 ROCP_SQ_WAVES | ||
|
||
METRICS | ||
GPU SALU insts per work-item ROCM0/ROCM1 | ||
|
||
LONG | ||
-- | ||
The average number of scalar ALU instructions executed per work-item | ||
(affected by flow control). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
SHORT SFetch Instructions | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_INSTS_SMEM | ||
ROCM1 ROCP_SQ_WAVES | ||
|
||
METRICS | ||
GPU SFETCH insts per work-item ROCM0/ROCM1 | ||
|
||
LONG | ||
-- | ||
The average number of scalar fetch instructions from the video memory | ||
executed per work-item (affected by flow control). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
SHORT ALU stalled by LDS | ||
|
||
EVENTSET | ||
ROCM0 ROCP_SQ_WAIT_INST_LDS | ||
ROCM1 ROCP_SQ_WAVES | ||
ROCM2 ROCP_GRBM_GUI_ACTIVE | ||
|
||
METRICS | ||
GPU ALD stalled 100*ROCM0*4/ROCM1/ROCM2 | ||
|
||
LONG | ||
-- | ||
The percentage of GPUTime ALU units are stalled by the LDS input queue | ||
being full or the output queue being not ready. If there are LDS bank | ||
conflicts, reduce them. Otherwise, try reducing the number of LDS | ||
accesses if possible. | ||
Value range: 0% (optimal) to 100% (bad). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
SHORT GPU utilization | ||
|
||
EVENTSET | ||
ROCM0 ROCP_GRBM_COUNT | ||
ROCM1 ROCP_GRBM_GUI_ACTIVE | ||
|
||
|
||
METRICS | ||
GPU utilization 100*ROCM1/ROCM0 | ||
|
||
|
||
LONG | ||
-- | ||
This group reassembles the 'GPUBusy' metric provided by RocProfiler. | ||
We should add, that we can select the GPUBusy metric directly and the | ||
calculations are done internally in case the metric formula changes. |
Oops, something went wrong.