forked from RRZE-HPC/likwid
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
49eb72d
commit d827ec6
Showing
25 changed files
with
1,978 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
SHORT Branch prediction miss rate/ratio | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 RETIRED_BRANCH_INSTR | ||
PMC3 RETIRED_MISP_BRANCH_INSTR | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
Branch rate PMC2/PMC0 | ||
Branch misprediction rate PMC3/PMC0 | ||
Branch misprediction ratio PMC3/PMC2 | ||
Instructions per branch PMC0/PMC2 | ||
|
||
LONG | ||
Formulas: | ||
Branch rate = RETIRED_BRANCH_INSTR/RETIRED_INSTRUCTIONS | ||
Branch misprediction rate = RETIRED_MISP_BRANCH_INSTR/RETIRED_INSTRUCTIONS | ||
Branch misprediction ratio = RETIRED_MISP_BRANCH_INSTR/RETIRED_BRANCH_INSTR | ||
Instructions per branch = RETIRED_INSTRUCTIONS/RETIRED_BRANCH_INSTR | ||
- | ||
The rates state how often on average a branch or a mispredicted branch occurred | ||
per instruction retired in total. The branch misprediction ratio sets directly | ||
into relation what ratio of all branch instruction where mispredicted. | ||
Instructions per branch is 1/branch rate. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
SHORT Data cache miss rate/ratio | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 DATA_CACHE_ACCESSES | ||
PMC3 ANY_DATA_CACHE_FILLS_ALL | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
data cache requests PMC2 | ||
data cache request rate PMC2/PMC0 | ||
data cache misses PMC3 | ||
data cache miss rate PMC3/PMC0 | ||
data cache miss ratio PMC3/PMC2 | ||
|
||
LONG | ||
Formulas: | ||
data cache requests = DATA_CACHE_ACCESSES | ||
data cache request rate = DATA_CACHE_ACCESSES / RETIRED_INSTRUCTIONS | ||
data cache misses = ANY_DATA_CACHE_FILLS_ALL | ||
data cache miss rate = ANY_DATA_CACHE_FILLS_ALL / RETIRED_INSTRUCTIONS | ||
data cache miss ratio = ANY_DATA_CACHE_FILLS_ALL / DATA_CACHE_ACCESSES | ||
- | ||
This group measures the locality of your data accesses with regard to the | ||
L1 cache. Data cache request rate tells you how data intensive your code is | ||
or how many data accesses you have on average per instruction. | ||
The data cache miss rate gives a measure how often it was necessary to get | ||
cache lines from higher levels of the memory hierarchy. And finally | ||
data cache miss ratio tells you how many of your memory references required | ||
a cache line to be loaded from a higher level. While the# data cache miss rate | ||
might be given by your algorithm you should try to get data cache miss ratio | ||
as low as possible by increasing your cache reuse. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
SHORT Cycles per instruction | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PWR1 RAPL_PKG_ENERGY | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] PMC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
Energy [J] PWR1 | ||
Power [W] PWR1/time | ||
|
||
LONG | ||
Formulas: | ||
CPI = CPU_CLOCKS_UNHALTED/RETIRED_INSTRUCTIONS | ||
Power [W] = PWR_PKG_ENERGY / time | ||
- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
SHORT Cycles per instruction | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 RETIRED_UOPS | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] PMC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
CPI (based on uops) PMC1/PMC2 | ||
IPC PMC0/PMC1 | ||
|
||
|
||
LONG | ||
Formulas: | ||
CPI = CPU_CLOCKS_UNHALTED/RETIRED_INSTRUCTIONS | ||
CPI (based on uops) = CPU_CLOCKS_UNHALTED/RETIRED_UOPS | ||
IPC = RETIRED_INSTRUCTIONS/CPU_CLOCKS_UNHALTED | ||
- | ||
This group measures how efficient the processor works with | ||
regard to instruction throughput. Also important as a standalone | ||
metric is RETIRED_INSTRUCTIONS as it tells you how many instruction | ||
you need to execute for a task. An optimization might show very | ||
low CPI values but execute many more instruction for it. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
SHORT Load to store ratio | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 LS_DISPATCH_LOADS | ||
PMC3 LS_DISPATCH_STORES | ||
PMC4 LS_DISPATCH_LOAD_OP_STORES | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
Load to store ratio (PMC2+PMC4)/(PMC3+PMC4) | ||
|
||
LONG | ||
Formulas: | ||
Load to store ratio = (LS_DISPATCH_LOADS+LS_DISPATCH_LOAD_OP_STORES)/(LS_DISPATCH_STORES+LS_DISPATCH_LOAD_OP_STORES) | ||
- | ||
This is a simple metric to determine your load to store ratio. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
SHORT Divide unit information | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 DIV_OP_COUNT | ||
PMC3 DIV_BUSY_CYCLES | ||
|
||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
Number of divide ops PMC2 | ||
Avg. divide unit usage duration PMC3/PMC2 | ||
|
||
LONG | ||
Formulas: | ||
Number of divide ops = DIV_OP_COUNT | ||
Avg. divide unit usage duration = DIV_BUSY_CYCLES/DIV_OP_COUNT | ||
-- | ||
This performance group measures the average latency of divide operations |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
SHORT Power and Energy consumption | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PWR0 RAPL_CORE_ENERGY | ||
PWR2 RAPL_L3_ENERGY | ||
|
||
|
||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
Energy Core [J] PWR0 | ||
Power Core [W] PWR0/time | ||
Energy L3 [J] PWR2 | ||
Power L3 [W] PWR2/time | ||
|
||
LONG | ||
Formulas: | ||
Power Core [W] = RAPL_CORE_ENERGY/time | ||
Power L3 [W] = RAPL_L3_ENERGY/time | ||
- | ||
Ryzen implements the RAPL interface previously introduced by Intel. | ||
This interface enables to monitor the consumed energy on the core and L3 | ||
domain. | ||
It is not documented by AMD which parts of the CPU are in which domain. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
SHORT Double Precision MFLOP/s | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 RETIRED_SSE_AVX_FLOPS_ALL | ||
PMC3 MERGE | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
DP [MFLOP/s] 1.0E-06*(PMC2)/time | ||
|
||
LONG | ||
Formulas: | ||
CPI = CPU_CLOCKS_UNHALTED/RETIRED_INSTRUCTIONS | ||
DP [MFLOP/s] = 1.0E-06*(RETIRED_SSE_AVX_FLOPS_ALL)/time | ||
- | ||
Profiling group to measure (double-precisision) FLOP rate. The event might | ||
have a higher per-cycle increment than 15, so the MERGE event is required. In | ||
contrast to AMD Zen, the Zen2 microarchitecture does not provide events to | ||
differentiate between single- and double-precision. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
SHORT Single Precision MFLOP/s | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 RETIRED_SSE_AVX_FLOPS_ALL | ||
PMC3 MERGE | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
SP [MFLOP/s] 1.0E-06*(PMC2)/time | ||
|
||
LONG | ||
Formulas: | ||
CPI = CPU_CLOCKS_UNHALTED/RETIRED_INSTRUCTIONS | ||
SP [MFLOP/s] = 1.0E-06*(RETIRED_SSE_AVX_FLOPS_ALL)/time | ||
- | ||
Profiling group to measure (single-precisision) FLOP rate. The event might | ||
have a higher per-cycle increment than 15, so the MERGE event is required. In | ||
contrast to AMD Zen, the Zen2 microarchitecture does not provide events to | ||
differentiate between single- and double-precision. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
SHORT Instruction cache miss rate/ratio | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 ICACHE_FETCHES | ||
PMC2 ICACHE_L2_REFILLS | ||
PMC3 ICACHE_SYSTEM_REFILLS | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/PMC0 | ||
L1I request rate PMC1/PMC0 | ||
L1I miss rate (PMC2+PMC3)/PMC0 | ||
L1I miss ratio (PMC2+PMC3)/PMC1 | ||
|
||
LONG | ||
Formulas: | ||
L1I request rate = ICACHE_FETCHES / RETIRED_INSTRUCTIONS | ||
L1I miss rate = (ICACHE_L2_REFILLS + ICACHE_SYSTEM_REFILLS)/RETIRED_INSTRUCTIONS | ||
L1I miss ratio = (ICACHE_L2_REFILLS + ICACHE_SYSTEM_REFILLS)/ICACHE_FETCHES | ||
- | ||
This group measures the locality of your instruction code with regard to the | ||
L1 I-Cache. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
SHORT L2 cache bandwidth in MBytes/s (experimental) | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 REQUESTS_TO_L2_GRP1_ALL_NO_PF | ||
PMC3 L2_PF_HIT_IN_L2_ALLPREF | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
L2 bandwidth [MBytes/s] 1.0E-06*(PMC2)*64.0/time | ||
L2 data volume [GBytes] 1.0E-09*(PMC2)*64.0 | ||
Prefetch bandwidth [MBytes/s] 1.0E-06*(PMC3)*64.0/time | ||
Prefetch data volume [GBytes] 1.0E-09*(PMC3)*64.0 | ||
|
||
LONG | ||
Formulas: | ||
L2 bandwidth [MBytes/s] = 1.0E-06*(REQUESTS_TO_L2_GRP1_ALL_NO_PF)*64/time | ||
L2 data volume [GBytes] = 1.0E-09*(REQUESTS_TO_L2_GRP1_ALL_NO_PF)*64 | ||
Prefetch bandwidth [MBytes/s] = 1.0E-06*(L2_PF_HIT_IN_L2_ALLPREF)*64/time | ||
Prefetch data volume [GBytes] = 1.0E-09*(L2_PF_HIT_IN_L2_ALLPREF)*64 | ||
- | ||
Profiling group to measure L2 cache load bandwidth including prefetchers. | ||
There are no events to count stores. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
SHORT L2 cache miss rate/ratio (experimental) | ||
|
||
EVENTSET | ||
PMC0 REQUESTS_TO_L2_GRP1_ALL_NO_PF | ||
PMC1 L2_PF_HIT_IN_L2 | ||
PMC2 L2_PF_HIT_IN_L3 | ||
PMC3 L2_PF_MISS_IN_L3 | ||
PMC4 CORE_TO_L2_CACHE_REQUESTS_HITS | ||
PMC5 RETIRED_INSTRUCTIONS | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
L2 request rate (PMC0+PMC1+PMC2+PMC3)/PMC5 | ||
L2 miss rate ((PMC0+PMC1+PMC2+PMC3)-(PMC4+PMC1))/PMC5 | ||
L2 miss ratio ((PMC0+PMC1+PMC2+PMC3)-(PMC4+PMC1))/(PMC0+PMC1+PMC2+PMC3) | ||
L2 accesses (PMC0+PMC1+PMC2+PMC3) | ||
L2 hits (PMC4+PMC1) | ||
L2 misses (PMC0+PMC1+PMC2+PMC3)-(PMC4+PMC1) | ||
|
||
LONG | ||
Formulas: | ||
L2 request rate = (REQUESTS_TO_L2_GRP1_ALL_NO_PF+L2_PF_HIT_IN_L2+L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3)/RETIRED_INSTRUCTIONS | ||
L2 miss rate = ((REQUESTS_TO_L2_GRP1_ALL_NO_PF+L2_PF_HIT_IN_L2+L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3)-(CORE_TO_L2_CACHE_REQUESTS_HITS+L2_PF_HIT_IN_L2))/INSTR_RETIRED_ANY | ||
L2 miss ratio = ((REQUESTS_TO_L2_GRP1_ALL_NO_PF+L2_PF_HIT_IN_L2+L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3)-(CORE_TO_L2_CACHE_REQUESTS_HITS+L2_PF_HIT_IN_L2))/(REQUESTS_TO_L2_GRP1_ALL_NO_PF+L2_PF_HIT_IN_L2+L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3) | ||
L2 accesses = (REQUESTS_TO_L2_GRP1_ALL_NO_PF+L2_PF_HIT_IN_L2+L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3) | ||
L2 hits = CORE_TO_L2_CACHE_REQUESTS_HITS+L2_PF_HIT_IN_L2 | ||
L2 misses = (REQUESTS_TO_L2_GRP1_ALL_NO_PF+L2_PF_HIT_IN_L2+L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3)-(CORE_TO_L2_CACHE_REQUESTS_HITS+L2_PF_HIT_IN_L2) | ||
- | ||
This group measures the locality of your data accesses with regard to the | ||
L2 cache. L2 request rate tells you how data intensive your code is | ||
or how many data accesses you have on average per instruction. | ||
The L2 miss rate gives a measure how often it was necessary to get | ||
cache lines from memory. And finally L2 miss ratio tells you how many of your | ||
memory references required a cache line to be loaded from a higher level. | ||
While the data cache miss rate might be given by your algorithm you should | ||
try to get data cache miss ratio as low as possible by increasing your cache reuse. | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
SHORT L3 cache bandwidth in MBytes/s | ||
|
||
EVENTSET | ||
FIXC1 ACTUAL_CPU_CLOCK | ||
FIXC2 MAX_CPU_CLOCK | ||
PMC0 RETIRED_INSTRUCTIONS | ||
PMC1 CPU_CLOCKS_UNHALTED | ||
PMC2 L2_PF_HIT_IN_L3 | ||
PMC3 L2_PF_MISS_IN_L3 | ||
PMC4 L2_CACHE_MISS_AFTER_L1_MISS | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI PMC1/PMC0 | ||
L3 bandwidth [MBytes/s] 1.0E-06*(PMC2+PMC3+PMC4)*64.0/time | ||
L3 data volume [GBytes] 1.0E-09*(PMC2+PMC3+PMC4)*64.0 | ||
|
||
LONG | ||
Formulas: | ||
L3 bandwidth [MBytes/s] = 1.0E-06*(L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3+L2_CACHE_MISS_AFTER_L1_MISS)*64.0/time | ||
L3 data volume [GBytes] = 1.0E-09*(L2_PF_HIT_IN_L3+L2_PF_MISS_IN_L3+L2_CACHE_MISS_AFTER_L1_MISS)*64.0 | ||
-- | ||
Profiling group to measure L3 cache bandwidth. It measures only loads from L3. | ||
There is no performance event to measure the stores to L3. |
Oops, something went wrong.