CCCR | Counter configuration control register (Intel specific) |
CPL | Current privilege level |
DEAR | Data event address register (Intel specific). This register records the program counter of the instruction which causes the most recent data cache (or data TLB cache) miss. |
DS | Debug storage (Intel specific) |
ESCR | Event selection control register (Intel specific) |
FMA | Fused multiply-add |
FMAC | Fused multiply-accumulate |
IBS | Instruction-based sampling (AMD specific), an idea very similar to PEBS |
IEAR | Instruction event address register (Intel specific). This register records the program counter of the instruction which causes the most recent instruction cache (or instruction TLB cache) miss. |
IEBS | Imprecise event based sampling (Intel specific) |
ISR | Interrupt service routine |
MSR | Model specific register |
PEBS | Precise event based sampling (Intel specific) |
PMC | Performance monitoring counter |
PMI | Performance monitoring interrupt (Intel specific). This interrupt is generated when a counter overflows and has been programmed to generate an interrupt, or when the PEBS interrupt threshold has been reached. |
PMU | Performance monitoring unit (Intel specific) |
QEAR | QuickPath Interconnect event address register (Intel specific) |
SSE | Streaming SIMD extension |
SMM | System management mode (Intel specific) |
TSC | Time-stamp counter |
VMX | Virtual machine extension (Intel specific) |
Useful resource for Intel x86 chips: Intel 64 & IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide, Part 2 and Appendix B of Intel 64 & IA-32 Architectures Optimization Reference Manual
Useful resource for AMD chips: BIOS & Kernel Developer's Guide for AMD Athlon 64 & AMD Opteron Processors and AMD64 Architecture Programmer's Manual Volume 2: System Programming
syscall(PFM_pfm_create, ...)
Since Linux 2.6.32 and PerfMon version 4.x, a new system call perf_event_open was added. It returns a file descriptor, which can be controlled and accessed using ioctl and read.
For example, to enable/disable counters, use
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0) ioctl(fd, PERF_EVENT_IOC_DISABLE, 0)where fd is the file descriptor.
PerfMon has some environment variables to control its runtime behavior: LIBPFM_VERBOSE, LIBPFM_DEBUG, LIBPFM_DEBUG_STDOUT, LIBPFM_FORCE_PMU, LIBPFM_ENCODE_INACTIVE
lsmod |grep perfctr(to test for PerfCtr)
ls /dev/perfctr(to test for PerfCtr)
ls /sys/class/perfctr(to test for PerfCtr)
ls /sys/kernel/perfmon(to test for PerfMon [PFM] version 3)
ls /proc/perfmon(to test for PerfMon [PFM] version 3)
cat /proc/sys/kernel/perf_event_paranoid(to test for PerfMon [PFM] version 4)
dmesg|grep PMU(to test for PCL)
ls /sys/devices/system/cpu/perf_events/*(to test for PCL)
grep arch_mon /proc/cpuinfo
MAKEVER = linux-perfctr-em64tthen Makefile.linux-perfctr-em64t will be used. Makefile.linux-perfctr-em64t dictates what events are available for monitoring.
In this case, the main entry function to set up the initial monitoring environment is setup_p4_presets. It calls _papi_pfm_setup_presets, which scans the file perfmon_events.csv (this file maps preset events to native events) for the line CPU,Intel Pentium4 and load all the preset events. It then calls _papi_hwd_fixup_fp & _papi_hwd_fixup_vec to load additional preset events (from the same perfmon_events.csv file) related to floating point operations. The latter two functions are dependent on the EVENTFLAGS line in Makefile.linux-perfctr-em64t. In PAPI 3.7.2, by default all preset events which are filed under
CPU,Intel Pentium4and
CPU,Intel Pentium4 FPU X87 SSE_DPare loaded.
MAKEVER will be set to one of $OS-pe, $OS-perfmon2, or $OS-pfm-$CPU, where $OS is either CLE (Cray Linux Environment) or linux, and $CPU can be p3, p4 , core, core2, atom, i7, opteron, or athlon.
Depending on the value of MAKEVER, FILENAME is set to Rules.$LIB, where $LIB is one of (on x86) perfctr-pfm, pfm_pe (which uses Linux PerfMon [PFM] version 3), or pfm4_pe (which uses Linux PerfMon [PFM] version 4).
The generated Makefile will load the Rules.$LIB specified by the FILENAME variable.
In addition to Makefile, the configure script will also generate papi_events_table.h, which is essentially the content of papi_events.csv.
To see what are loaded from papi_events_table.h, compile PAPI with -DDEBUG, then at runtime, set the environment variable PAPI_DEBUG to SUBSTRATE, and run any PAPI programs, e.g. papi_avail. One should then see some output like:
.... SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_init:1254:9447 -1 SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_init:1258:9447 51 perf perf_events generic PMU 3 SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_init:1258:9447 53 wsm_dp Intel Westmere DP 1 SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_init:1267:9447 wsm_dp is default SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_init:1254:9447 -1 .... SUBSTRATE:papi_libpfm_presets.c:load_preset_table:250:9447 CPU token found on line 8 SUBSTRATE:papi_libpfm_presets.c:load_preset_table:268:9447 Examining CPU (AMD64 (K7)) vs. (wsm_dp) .... SUBSTRATE:papi_libpfm_presets.c:load_preset_table:268:9447 Examining CPU (wsm_dp) vs. (wsm_dp) SUBSTRATE:papi_libpfm_presets.c:load_preset_table:274:9447 Found CPU wsm_dp at line 341 of builtin papi_events_table. SUBSTRATE:papi_libpfm_presets.c:load_preset_table:280:9447 No additional qualifier found, matching on string. SUBSTRATE:papi_libpfm_presets.c:load_preset_table:312:13785 Examining preset PAPI_TOT_CYC SUBSTRATE:papi_libpfm_presets.c:load_preset_table:321:13785 Found 0x8000003b for PAPI_TOT_CYC SUBSTRATE:papi_libpfm_presets.c:load_preset_table:331:13785 Examining derived NOT_DERIVED SUBSTRATE:papi_libpfm_presets.c:load_preset_table:340:13785 Found 0 for NOT_DERIVED SUBSTRATE:papi_libpfm_presets.c:load_preset_table:342:13785 Adding 0x8000003b,0 to preset search table. SUBSTRATE:papi_libpfm_presets.c:load_preset_table:373:13785 Adding term (0) UNHALTED_CORE_CYCLES to preset event 0x8000003b. SUBSTRATE:papi_libpfm_presets.c:load_preset_table:405:13785 # events inserted: --1-- ....Here, wsm_dp means Intel Westmere dual sockets/processors, and 0x8000003b is the bitwise OR of PAPI_PRESET_MASK and PAPI_TOT_CYC_idx (both defined in papiStdEventDefs.h)
Cross-checking the papi_events.csv file, one can see
.... CPU,Intel Nehalem CPU,Intel Westmere CPU,nhm CPU,nhm_ex CPU,wsm CPU,wsm_dp # PRESET,PAPI_TOT_CYC,NOT_DERIVED,UNHALTED_CORE_CYCLES PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTRUCTION_RETIRED PRESET,PAPI_L1_ICM,NOT_DERIVED,L1I:MISSES ...and UNHALTED_CORE_CYCLES is actually a literal in Linux PerfMon [PFM] library (in this case, the literal can be found in libpfm4/lib/events/intel_wsm_events.h)
To see how PAPI thinks about your CPU, compile PAPI with -DDEBUG, then at runtime, set the environment variable PAPI_DEBUG to SUBSTRATE, and run any PAPI programs, e.g. papi_avail. One should then see some output like:
.... SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_init:1258:1164 8 netburst_p Pentium4 (Prescott) 1 SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_init:1267:1164 netburst_p is default .... SUBSTRATE:papi_libpfm_presets.c:load_preset_table:268:1164 Examining CPU (netburst) vs. (netburst_p) SUBSTRATE:papi_libpfm_presets.c:load_preset_table:250:1164 CPU token found on line 756 ...So PAPI uses PerfMon (PFM) version 4 to recognize that your CPU is netburst_p, but the papi_events.csv file does not have any match for it.
The fix is simple: just tweak papi_events.csv by adding netburst_p.
Which counter is seized depends on the architecture. For example, for the Pentium 4/Netburst, it is IQ_COUNTER0:
/* * Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter * CRU_ESCR0 (with any non-null event selector) through a complemented * max threshold. [IA32-Vol3, Section 14.9.9] */ static int setup_p4_watchdog(unsigned nmi_hz) { .... } */(full source code here)
Newer Linux (e.g. version 3.2) has a function to handle this:
#ifdef CONFIG_HARDLOCKUP_DETECTOR static int watchdog_nmi_enable(int cpu) { ... /* Try to register using hardware perf events */ event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL); ... }
To see if any hardware performance counter is seized due to this, one can try to :
Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver. ... version: 3 ... bit width: 48 ... generic registers: 4 ... value mask: 0000ffffffffffff ... max period: 000000007fffffff ... fixed-purpose events: 3 ... event mask: 000000070000000f NMI watchdog enabled, takes one hw-pmu counter.
To disable this lockup watchdog, execute
echo 0 > /proc/sys/kernel/nmi_watchdogas root.
If you see PAPI Error in PAPI_library_init: PAPI_ENOSUPP or Pentium 4 not supported on kernels before 2.6.35, this is the cause. (One can also run PAPI example code ctests/nmi_watchdog and check its output.)
In PAPI, there are two seemingly identical events: PAPI_FP_INS and PAPI_FP_OPS. The exact meanings of these events are architecture-dependent (see PAPI FAQ), but in general:
In PAPI 4.2, they are defined as
Architecture | PAPI_FP_INS | PAPI_FP_OPS |
---|---|---|
Pentium 4 Netburst | x87_FP_uop | scalar_DP_uop (default, but can be configured at compile time to scalar_SP_uop. See PAPI FAQ) |
Nehalem Westmere | FP_COMP_OPS_EXE:SSE_FP | FP_COMP_OPS_EXE:SSE_FP + FP_COMP_OPS_EXE:X87 |
Sandy Bridge | FP_COMP_OPS_EXE:X87 + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE | (same as left) |