Skip to content

Commit

Permalink
Add support for Intel Granite Rapids and Sierra Forrest (RRZE-HPC#639)
Browse files Browse the repository at this point in the history
* Add support for Intel Granite Rapids and Sierra Forrest

* Fix RAPL DRAM energy unit for SPR

* Improve error handling in likwid-sysfeatures

likwid-sysfeatures now also queries the categories properly, so
different categories with the same feature name do not conflict.

* Add missing includes to sysFeatures_common_rapl

Due to include order the missing includes did not cause problems, but in
future files and commits, this must be fixed.

* Allow hexadecimal numbers in sysFeatures

To give the user more flexibility when specifying numbers.

* Add missing (void) in sysFeatures_types.h

* Add AMD HSMP sysFeatures support

* Fix APIC mapping in sysFeatures_amd_hsmp

Likwid and hwloc currently have an incomplete/wrong understanding of
APIC IDs and blindly assume a mapping of linux processor number to APIC
ID. This is wrong. For example AMD EPYC 9354 reports gaps and jumps in
its ID order. This commit makes sure to explicitly query the APIC ID via
CPUID in order to correctly map LikwidDevice_t to APIC IDs.

* Explicitly set DRAM energy unit on Sapphire Rapids

While the power unit MSR appers to match the value specified in the
Intel SDM, it specifies the energy unit is always 61 uJ. There is no
mention to read it from MSR, so we always assume it is 61 uJ.

* Restore old likwid-sysfeatures hwthread behavior

Commit 8c49e8a introduced device type prefixes, which broke the old
device/cpu list behavior of just specifying a range of hardware threads
(e.g. 0-12). This commit restores this behavior when no device type
prefix is specified. The only remaining difference is that higher level
devices (e.g. cores, sockets, etc.) are not implicitly created.

* Fix missing include in sysFeatures_amd

* Finalize Intel Granite Rapids support

* Fix for Intel SPR TMA metrics

* New counter list for SPR

* Remove unrequired read in finalize of Intel SPR and GNR

* Add groups for GNR

* Add support for Intel Sierra Forrest (core, uncore, energy)

* Add support for Intel Granite Rapids and Sierra Forrest

* Finalize Intel Granite Rapids support

* Fix for Intel SPR TMA metrics

* New counter list for SPR

* Remove unrequired read in finalize of Intel SPR and GNR

* Add groups for GNR

* Add support for Intel Sierra Forrest (core, uncore, energy)

* Add update of device location for SPR UPI and M3UPI units

See https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/uncore_snbep.c#L6591-L6640

* Add way to use unnamed perf uncore devices

---------

Co-authored-by: chriswasser <[email protected]>
Co-authored-by: Michael Panzlaff <[email protected]>
  • Loading branch information
3 people authored Nov 8, 2024
1 parent 9e14e9b commit 638191a
Show file tree
Hide file tree
Showing 55 changed files with 12,573 additions and 1,991 deletions.
32 changes: 32 additions & 0 deletions groups/GNR/BRANCH.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
SHORT Branch prediction miss rate/ratio

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 BR_INST_RETIRED_ALL_BRANCHES
PMC1 BR_MISP_RETIRED_ALL_BRANCHES

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Branch rate PMC0/FIXC0
Branch misprediction rate PMC1/FIXC0
Branch misprediction ratio PMC1/PMC0
Instructions per branch FIXC0/PMC0

LONG
Formulas:
Branch rate = BR_INST_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY
Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES/BR_INST_RETIRED_ALL_BRANCHES
Instructions per branch = INSTR_RETIRED_ANY/BR_INST_RETIRED_ALL_BRANCHES
-
The rates state how often on average a branch or a mispredicted branch occurred
per instruction retired in total. The branch misprediction ratio sets directly
into relation what ratio of all branch instruction where mispredicted.
Instructions per branch is 1/branch rate.

27 changes: 27 additions & 0 deletions groups/GNR/CLOCK.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
SHORT Power and Energy consumption

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PWR0 PWR_PKG_ENERGY
UBOX0 UNCORE_CLOCKTICKS

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
Uncore Clock [MHz] 1.E-06*UBOX0/time
CPI FIXC1/FIXC0
Energy [J] PWR0
Power [W] PWR0/time

LONG
Formulas:
Power = PWR_PKG_ENERGY / time
Uncore Clock [MHz] = 1.E-06 * UNCORE_CLOCKTICKS / time
-
Icelake implements the RAPL interface. This interface enables to
monitor the consumed energy on the package (socket) level.

38 changes: 38 additions & 0 deletions groups/GNR/CYCLE_ACTIVITY.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
SHORT Cycle Activities

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 CYCLE_ACTIVITY_CYCLES_L2_MISS
PMC1 CYCLE_ACTIVITY_CYCLES_MEM_ANY
PMC2 CYCLE_ACTIVITY_CYCLES_L1D_MISS
PMC3 CYCLE_ACTIVITY_STALLS_TOTAL

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Cycles without execution [%] (PMC3/FIXC1)*100
Cycles without execution due to L1D [%] (PMC2/FIXC1)*100
Cycles without execution due to L2 [%] (PMC0/FIXC1)*100
Cycles without execution due to memory loads [%] (PMC1/FIXC1)*100

LONG
Formulas:
Cycles without execution [%] = CYCLE_ACTIVITY_STALLS_TOTAL/CPU_CLK_UNHALTED_CORE*100
Cycles with stalls due to L1D [%] = CYCLE_ACTIVITY_CYCLES_L1D_MISS/CPU_CLK_UNHALTED_CORE*100
Cycles with stalls due to L2 [%] = CYCLE_ACTIVITY_CYCLES_L2_MISS/CPU_CLK_UNHALTED_CORE*100
Cycles without execution due to memory loads [%] = CYCLE_ACTIVITY_CYCLES_MEM_ANY/CPU_CLK_UNHALTED_CORE*100
--
This performance group measures the cycles while waiting for data from the cache
and memory hierarchy.
CYCLE_ACTIVITY_STALLS_TOTAL: Total execution stalls.
CYCLE_ACTIVITY_CYCLES_L1D_MISS: Cycles while L1 cache miss demand load is
outstanding.
CYCLE_ACTIVITY_CYCLES_L2_MISS: Cycles while L2 cache miss demand load is
outstanding.
CYCLE_ACTIVITY_CYCLES_MEM_ANY: Cycles while memory subsystem has an
outstanding load.
39 changes: 39 additions & 0 deletions groups/GNR/CYCLE_STALLS.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
SHORT Cycle Activities (Stalls)

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 CYCLE_ACTIVITY_STALLS_L2_MISS
PMC2 CYCLE_ACTIVITY_STALLS_L1D_MISS
PMC3 CYCLE_ACTIVITY_STALLS_TOTAL

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Total execution stalls PMC3
Stalls caused by L1D misses [%] (PMC2/PMC3)*100
Stalls caused by L2 misses [%] (PMC0/PMC3)*100
Execution stall rate [%] (PMC3/FIXC1)*100
Stalls caused by L1D misses rate [%] (PMC2/FIXC1)*100
Stalls caused by L2 misses rate [%] (PMC0/FIXC1)*100

LONG
Formulas:
Total execution stalls = CYCLE_ACTIVITY_STALLS_TOTAL
Stalls caused by L1D misses [%] = (CYCLE_ACTIVITY_STALLS_L1D_MISS/CYCLE_ACTIVITY_STALLS_TOTAL)*100
Stalls caused by L2 misses [%] = (CYCLE_ACTIVITY_STALLS_L2_MISS/CYCLE_ACTIVITY_STALLS_TOTAL)*100
Execution stall rate [%] = (CYCLE_ACTIVITY_STALLS_TOTAL/CPU_CLK_UNHALTED_CORE)*100
Stalls caused by L1D misses rate [%] = (CYCLE_ACTIVITY_STALLS_L1D_MISS/CPU_CLK_UNHALTED_CORE)*100
Stalls caused by L2 misses rate [%] = (CYCLE_ACTIVITY_STALLS_L2_MISS/CPU_CLK_UNHALTED_CORE)*100
--
This performance group measures the stalls caused by data traffic in the cache
hierarchy.
CYCLE_ACTIVITY_STALLS_TOTAL: Total execution stalls.
CYCLE_ACTIVITY_STALLS_L1D_MISS: Execution stalls while L1 cache miss demand
load is outstanding.
CYCLE_ACTIVITY_STALLS_L2_MISS: Execution stalls while L2 cache miss demand
load is outstanding.
23 changes: 23 additions & 0 deletions groups/GNR/DATA.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
SHORT Load to store ratio

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 MEM_INST_RETIRED_ALL_LOADS
PMC1 MEM_INST_RETIRED_ALL_STORES

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Load to store ratio PMC0/PMC1

LONG
Formulas:
Load to store ratio = MEM_INST_RETIRED_ALL_LOADS/MEM_INST_RETIRED_ALL_STORES
-
This is a metric to determine your load to store ratio.

25 changes: 25 additions & 0 deletions groups/GNR/DIVIDE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
SHORT Divide unit information

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 ARITH_DIV_COUNT
PMC1 ARITH_DIV_ACTIVE


METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Number of divide ops PMC0
Avg. divide unit usage duration PMC1/PMC0

LONG
Formulas:
Number of divide ops = ARITH_DIV_COUNT
Avg. divide unit usage duration = ARITH_DIV_ACTIVE/ARITH_DIV_COUNT
-
This performance group measures the average latency of divide operations
43 changes: 43 additions & 0 deletions groups/GNR/ENERGY.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
SHORT Power and Energy consumption

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
TMP0 TEMP_CORE
PWR0 PWR_PKG_ENERGY
PWR1 PWR_PP0_ENERGY
PWR3 PWR_DRAM_ENERGY
PWR4 PWR_PLATFORM_ENERGY
UBOX0 UNCORE_CLOCKTICKS



METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
Uncore Clock [MHz] 1.E-06*UBOX0/time
CPI FIXC1/FIXC0
Temperature [C] TMP0
Energy [J] PWR0
Power [W] PWR0/time
Energy PP0 [J] PWR1
Power PP0 [W] PWR1/time
Energy DRAM [J] PWR3
Power DRAM [W] PWR3/time
Energy PLATFORM [J] PWR4
Power PLATFORM [W] PWR4/time

LONG
Formulas:
Power = PWR_PKG_ENERGY / time
Power PP0 = PWR_PP0_ENERGY / time
Power DRAM = PWR_DRAM_ENERGY / time
Power PLATFORM = PWR_PLATFORM_ENERGY / time
-
Icelake implements the RAPL interface. This interface enables to
monitor the consumed energy on the package (socket), DRAM and
platform level.

26 changes: 26 additions & 0 deletions groups/GNR/FLOPS_AVX.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
SHORT Packed AVX MFLOP/s

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE
PMC1 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE
PMC2 FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
Packed SP [MFLOP/s] 1.0E-06*(PMC0*8.0+PMC2*16.0)/time
Packed DP [MFLOP/s] 1.0E-06*(PMC1*4.0+PMC3*8.0)/time

LONG
Formulas:
Packed SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
Packed DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
-
Packed 32b AVX FLOPs rates.
35 changes: 35 additions & 0 deletions groups/GNR/FLOPS_DP.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
SHORT Double Precision MFLOP/s

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE
PMC1 FP_ARITH_INST_RETIRED_SCALAR_DOUBLE
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
DP [MFLOP/s] 1.0E-06*(PMC0*2.0+PMC1+PMC2*4.0+PMC3*8.0)/time
AVX DP [MFLOP/s] 1.0E-06*(PMC2*4.0+PMC3*8.0)/time
AVX512 DP [MFLOP/s] 1.0E-06*(PMC3*8.0)/time
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2+PMC3)/time
Scalar [MUOPS/s] 1.0E-06*PMC1/time
Vectorization ratio [%] 100*(PMC0+PMC2+PMC3)/(PMC0+PMC1+PMC2+PMC3)

LONG
Formulas:
DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE*2+FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
AVX DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
AVX512 DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)/runtime
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_DOUBLE/runtime
Vectorization ratio [%] = 100*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)/(FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)
-
SSE scalar and packed double precision FLOP rates.

35 changes: 35 additions & 0 deletions groups/GNR/FLOPS_SP.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
SHORT Single Precision MFLOP/s

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE
PMC1 FP_ARITH_INST_RETIRED_SCALAR_SINGLE
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
SP [MFLOP/s] 1.0E-06*(PMC0*4.0+PMC1+PMC2*8.0+PMC3*16.0)/time
AVX SP [MFLOP/s] 1.0E-06*(PMC2*8.0+PMC3*16.0)/time
AVX512 SP [MFLOP/s] 1.0E-06*(PMC3*16.0)/time
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2+PMC3)/time
Scalar [MUOPS/s] 1.0E-06*PMC1/time
Vectorization ratio [%] 100*(PMC0+PMC2+PMC3)/(PMC0+PMC1+PMC2+PMC3)

LONG
Formulas:
SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE*4+FP_ARITH_INST_RETIRED_SCALAR_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
AVX SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
AVX512 SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE)/runtime
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_SINGLE/runtime
Vectorization ratio [%] [%] = 100*(FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE)/(FP_ARITH_INST_RETIRED_SCALAR_SINGLE+FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE)
-
SSE scalar and packed single precision FLOP rates.

38 changes: 38 additions & 0 deletions groups/GNR/L2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
SHORT L2 cache bandwidth in MBytes/s

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
FIXC3 TOPDOWN_SLOTS
PMC0 L1D_REPLACEMENT
PMC1 L2_TRANS_L1D_WB

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
CPI FIXC1/FIXC0
L2D load bandwidth [MBytes/s] 1.0E-06*PMC0*64.0/time
L2D load data volume [GBytes] 1.0E-09*PMC0*64.0
L2D evict bandwidth [MBytes/s] 1.0E-06*PMC1*64.0/time
L2D evict data volume [GBytes] 1.0E-09*PMC1*64.0
L2 bandwidth [MBytes/s] 1.0E-06*(PMC0+PMC1)*64.0/time
L2 data volume [GBytes] 1.0E-09*(PMC0+PMC1)*64.0

LONG
Formulas:
L2D load bandwidth [MBytes/s] = 1.0E-06*L1D_REPLACEMENT*64.0/time
L2D load data volume [GBytes] = 1.0E-09*L1D_REPLACEMENT*64.0
L2D evict bandwidth [MBytes/s] = 1.0E-06*L2_TRANS_L1D_WB*64.0/time
L2D evict data volume [GBytes] = 1.0E-09*L2_TRANS_L1D_WB*64.0
L2 bandwidth [MBytes/s] = 1.0E-06*(L1D_REPLACEMENT+L2_TRANS_L1D_WB)*64/time
L2 data volume [GBytes] = 1.0E-09*(L1D_REPLACEMENT+L2_TRANS_L1D_WB)*64
-
Profiling group to measure L2 cache bandwidth. The bandwidth is computed by the
number of cache line allocated in the L1 and the number of modified cache lines
evicted from the L1. The group also output total data volume transferred between
L2 and L1. Note that this bandwidth also includes data transfers due to a write
allocate load on a store miss in L1. It does not include data loaded into the L1
instruction cache.

Loading

0 comments on commit 638191a

Please sign in to comment.