Skip to content

Sandy Bridge EP

Thomas.Roehl edited this page Oct 16, 2015 · 11 revisions

Architecture specific notes for Intel® SandyBridge EP/EN

Performance groups

Intel® SandyBridge EP/EN Performance groups

Events

The input file for the events on Intel® SandyBridge EP/EN can be found here.

Counters

Core-local counters

Fixed-purpose counters

Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.

Counters
Counter name Event name
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
##### Available Options
Option Argument Description Comment
anythread N Set bit 2+(index*4) in config register
kernel N Set bit (index*4) in config register
#### General-purpose counters The Intel® SandyBridge EP/EN microarchitecture provides 4 general-purpose counters consisting of a config and a counter register. ##### Counters
Counter name Event name
PMC0 *
PMC1 *
PMC2 *
PMC3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
kernel N Set bit 17 in config register
anythread N Set bit 21 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
##### Special handling for events The Intel® SandyBridge EP/EN microarchitecture provides measureing of offcore events in PMC counters. Therefore the stream of offcore events must be filtered using the OFFCORE_RESPONSE registers. The Intel® SandyBridge microarchitecture has two of those registers. LIKWID defines some events that perform the filtering according to the event name. Although there are many bitmasks possible, LIKWID natively provides only the ones with response type ANY. Own filtering can be applied with the OFFCORE_RESPONSE_0_OPTIONS and OFFCORE_RESPONSE_1_OPTIONS events. Only for those events two more counter options are available:
Option Argument Description Comment
match0 16 bit hex value Input value masked with 0x8FFF and written to bits 0-15 in the OFFCORE_RESPONSE register Check the Intel®®® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/JKT.
match0 22 bit hex value Input value is written to bits 16-37 in the OFFCORE_RESPONSE register Check the Intel®®® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/JKT.

Thermal counter

The Intel® SandyBridge EP/EN microarchitecture provides one register for the current core temperature.

Counters
Counter name Event name
TMP0 TEMP_CORE

Socket-wide counters

Energy counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the current energy consumption through the RAPL interface.

Counters
Counter name Event name
PWR0 PWR_PKG_ENERGY
PWR1 PWR_PP0_ENERGY
PWR2 PWR_PP1_ENERGY
PWR3 PWR_DRAM_ENERGY

Memory controller fixed-purpose counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The integrated Memory Controller provides the interface to DRAM and communicates to the rest of the uncore through the Home Agent (i.e. the iMC does not connect to the Ring).
In conjunction with the HA, the memory controller also provides a variety of RAS features, such as ECC, lockstep, memory access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.

The uncore management performance counters are exposed to the operating system through PCI interfaces. All SandyBridge based systems have one memory controller. There are 4 different PCI devices per memory controller, each covering one memory channel. Each channel one fixed counter for the DRAM clock. The name MBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
MBOX<0-3>FIX DRAM_CLOCKTICKS
#### Memory controller general-purpose counters The Intel® SandyBridge EP/EN microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The integrated Memory Controller provides the interface to DRAM and communicates to the rest of the uncore through the Home Agent (i.e. the iMC does not connect to the Ring).
In conjunction with the HA, the memory controller also provides a variety of RAS features, such as ECC, lockstep, memory access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.

The uncore management performance counters are exposed to the operating system through PCI interfaces. All SandyBridge based systems have one memory controller. There are 4 different PCI devices per memory controller, each covering one memory channel. Each channel has 4 different general-purpose counters. The name MBOX originates from the Nehalem EX uncore monitoring. ##### Counters
Counter name Event name
MBOX<0-3>C0 *
MBOX<0-3>C1 *
MBOX<0-3>C2 *
MBOX<0-3>C3 *
##### Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register

Last Level cache counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the LLC coherency engine in the uncore. The description from Intel®:
The LLC coherence engine (CBo) manages the interface between the core and the last level cache (LLC). All core transactions that access the LLC are directed from the core to a CBo via the ring interconnect. The CBo is responsible for managing data delivery from the LLC to the requesting core. It is also responsible for maintaining coherence between the cores within the socket that share the LLC; generating snoops and collecting snoop responses from the local cores when the MESIF protocol requires it.
The Last Level cache performance counters are exposed to the operating system through the MSR interface. SandyBridge EN/EP systems have maximal 8 CBOXes, each with 4 general-purpose counters. The name CBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
CBOX<0-7>C0 *
CBOX<0-7>C1 *
CBOX<0-7>C2 *
CBOX<0-7>C3 *
##### Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
opcode 9 bit opcode identifier, see uncore performance monitoring guide for SandyBridge Set bits 23-31 in CBOX filter register MSR_UNC_C<0-7>_PMON_BOX_FILTER LIKWID checks whether the given value is a valid opcode. A list of all valid opcodes can be found in the Intel® E5-2600 uncore monitoring guide
state 5 bit state representation Set bits 18-22 in CBOX filter register MSR_UNC_C<0-7>_PMON_BOX_FILTER F: 0x10,
M: 0x08,
E: 0x04,
S: 0x02,
I: 0x01
nid 8 bit node ID Set bits 10-17 in CBOX filter register MSR_UNC_C<0-7>_PMON_BOX_FILTER Note that for Node ID 0 the hex value should be 0x01.
tid 5 bit thread ID value Set bits 0-4 in CBOX filter register MSR_UNC_C<0-7>_PMON_BOX_FILTER Bit 0 means physical or logical thread, bits 1-3 the core ID
##### Special handling for events The Intel® SandyBridge EP/EN microarchitecture provides an event LLC_LOOKUP which can be filtered with the 'state' option. If no 'state' is set, LIKWID sets the state to 0x1F, the default value to measure all lookups.

Uncore management fixed-purpose counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for the Intel® Xeon Processor E5-2600 family uncore.
In this capacity, the UBox acts as the central unit for a variety of functions:

  • The master for reading and writing physically distributed registers across the uncore using the Message Channel.
  • The UBox is the intermediary for interrupt traffic, receiving interrupts from the sytem and dispatching interrupts to the appropriate core.
  • The UBox serves as the system lock master used when quiescing the platform (e.g., Intel® QPI bus lock).

The UBOX offers one fixed-purpose counter that measures the clock frequency of the clock source of the uncore. The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
UBOXFIX UBOX_CLOCKTICKS

Uncore management general-purpose counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for the Intel® Xeon Processor E5-2600 family uncore.
In this capacity, the UBox acts as the central unit for a variety of functions:

  • The master for reading and writing physically distributed registers across the uncore using the Message Channel.
  • The UBox is the intermediary for interrupt traffic, receiving interrupts from the sytem and dispatching interrupts to the appropriate core.
  • The UBox serves as the system lock master used when quiescing the platform (e.g., Intel® QPI bus lock).

The UBOX offers two general-purpose counter that measures the clock frequency of the clock source of the uncore. The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
UBOX0 *
UBOX1 *
##### Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
threshold 5 bit hex value Set bits 24-28 in config register
invert N Set bit 23 in config register

QPI Link Layer fixed-purpose counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the QPI Link layer (QPI) in the uncore. The description from Intel®:
The Intel® QPI Link Layer is responsible for packetizing requests from the caching agent on the way out to the system interface. As such, it shares responsibility with the CBo(s) as the Intel® QPI caching agent(s). It is responsible for converting CBo requests to Intel® QPI messages (i.e. snoop generation and data response messages from the snoop response) as well as converting/forwarding ring messages to Intel® QPI packets and vice versa.
The Intel® QPI is split into two separate layers. The Intel® QPI LL (link layer) is responsible for generating, transmitting, and receiving packets with the Intel® QPI link.

The QPI hardware performance counters are exposed to the operating system through PCI interfaces. There are two of those interfaces for the QPI. If your system has not all interfaces but interface 0 does not work, try the other one. Each interface offers a fixed counter that exposes the QPI speed in GT/s. The name SBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
SBOX<0,1>FIX QPI_RATE, QPI_SLOW_MODE

QPI Link Layer general-purpose counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the QPI Link layer (QPI) in the uncore. The description from Intel®:
The Intel® QPI Link Layer is responsible for packetizing requests from the caching agent on the way out to the system interface. As such, it shares responsibility with the CBo(s) as the Intel® QPI caching agent(s). It is responsible for converting CBo requests to Intel® QPI messages (i.e. snoop generation and data response messages from the snoop response) as well as converting/forwarding ring messages to Intel® QPI packets and vice versa.
The Intel® QPI is split into two separate layers. The Intel® QPI LL (link layer) is responsible for generating, transmitting, and receiving packets with the Intel® QPI link.

The QPI hardware performance counters are exposed to the operating system through PCI interfaces. There are two of those interfaces for the QPI. If your system has not all interfaces but interface 0 does not work, try the other one. Each interface offers four general-purpose counters. The name SBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
SBOX<0,1>C0 *
SBOX<0,1>C1 *
SBOX<0,1>C2 *
SBOX<0,1>C3 *
##### Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
threshold 5 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
match0 32 bit hex address Input value masked with 0x8003FFF8 and written to bits 0-31 in the PCI_UNC_QPI_PMON_MATCH_0 register of PCI device Only if corresponding device available. See Intel® E5-2600 uncore monitoring guide for fields in PCI_UNC_QPI_PMON_MATCH_0
match0 20 bit hex address Input value masked with 0x000F000F and written to bits 0-19 in the PCI_UNC_QPI_PMON_MATCH_1 register of PCI device Only if corresponding device available. See Intel® E5-2600 uncore monitoring guide for fields in PCI_UNC_QPI_PMON_MATCH_1
mask0 32 bit hex address Input value masked with 0x8003FFF8 and written to bits 0-31 in the PCI_UNC_QPI_PMON_MASK_0 register of PCI device Only if corresponding device available. See Intel® E5-2600 uncore monitoring guide for fields in PCI_UNC_QPI_PMON_MASK_0
mask0 20 bit hex address Input value masked with 0x000F000F and written to bits 0-19 in the PCI_UNC_QPI_PMON_MASK_1 register of PCI device Only if corresponding device available. See Intel® E5-2600 uncore monitoring guide for fields in PCI_UNC_QPI_PMON_MASK_1

Home Agent counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the Home Agent (HA) in the uncore. The description from Intel®:
The HA is responsible for the protocol side of memory interactions, including coherent and non-coherent home agent protocols (as defined in the Intel® QuickPath Interconnect Specification). Additionally, the HA is responsible for ordering memory reads/writes, coming in from the modular Ring, to a given address such that the iMC (memory controller).
In other words, it is the coherency agent responsible for guarding the memory controller. All requests for memory attached to the coupled iMC must first be ordered through the HA.

The HA hardware performance counters are exposed to the operating system through PCI interfaces. The name BBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
BBOX0 *
BBOX1 *
BBOX2 *
BBOX3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
opcode 6 bit hex value Set bits 0-5 in PCI_UNC_HA_PMON_OPCODEMATCH register of PCI device A table of all valid opcodes can be found in the Intel® E5-2600 uncore monitoring guide.
match0 46 bit hex address Extract bits 6-31 and set bits 6-31 in PCI_UNC_HA_PMON_ADDRMATCH0 register of PCI device
Extract bits 32-45 and set bits 0-13 in PCI_UNC_HA_PMON_ADDRMATCH1 register of PCI device

Power control unit fixed-purpose counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the physical processor package.
The uncore implements a power control unit acting as a core/uncore power and thermal manager. It runs its firmware on an internal micro-controller and coordinates the socket’s power states.

The Intel® SandyBridge EP/EN microarchitecture offers two fixed-purpose counters to measure the CPU cores in state C6 and C3. The PCU performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
WBOX0FIX CORES_IN_C3
WBOX1FIX CORES_IN_C6

Power control unit general-purpose counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the physical processor package.
The uncore implements a power control unit acting as a core/uncore power and thermal manager. It runs its firmware on an internal micro-controller and coordinates the socket’s power states.

The PCU performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
WBOX0 *
WBOX1 *
WBOX2 *
WBOX3 *
##### Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
threshold 5 bit hex value Set bits 24-28 in config register
invert N Set bit 23 in config register
match0 32 bit hex value Set bits 0-31 in
MSR_UNC_PCU_PMON_BOX_FILTER register
Band0: bits 0-7,
Band1: bits 8-15,
Band2: bits 16-23,
Band3: bits 24-31
occupancy 2 bit hex value Set bit 14-15 in config register Cores
in C0: 0x1,
in C3: 0x2,
in C6: 0x3
occ_edgedetect N Set bit 31 in config register
occ_invert N Set bit 30 in config register

Ring-to-QPI counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the Ring-to-QPI (R3QPI) interface in the uncore. The description from Intel®:
R3QPI is the interface between the Intel® QPI Link Layer, which packetizes requests, and the Ring.
R3QPI is the interface between the ring and the Intel® QPI Link Layer. It is responsible for translating between ring protocol packets and flits that are used for transmitting data across the Intel® QPI interface. It performs credit checking between the local Intel® QPI LL, the remote Intel® QPI LL and other agents on the local ring.

The R3QPI performance counters are exposed to the operating system through PCI interfaces. Since the RBOXes manage the traffic from the LLC-connecting ring interface on the socket with the QPI interfaces (SBOXes), the amount is similar to the amount of SBOXes. See at SBOXes how many are available for which system configuration. The name RBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
RBOX<0,1>C0 *
RBOX<0,1>C1 *
RBOX<0,1>C2 *
##### Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register

Ring-to-PCIe counters

The Intel® SandyBridge EP/EN microarchitecture provides measurements of the Ring-to-PCIe (R2PCIe) interface in the uncore. The description from Intel®:
R2PCIe represents the interface between the Ring and IIO traffic to/from PCIe.
The R2PCIe performance counters are exposed to the operating system through a PCI interface. Independent of the system's configuration, there is only one Ring-to-PCIe interface. The name PBOX originates from the Nehalem EX uncore monitoring.

Counters
Counter name Event name
PBOX0 *
PBOX1 *
PBOX2 *
PBOX3 *
##### Available Options
Option Argument Operation Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
Clone this wiki locally