Skip to content

Commit

Permalink
Merge branch 'master' of github.com:RRZE-HPC/likwid
Browse files Browse the repository at this point in the history
  • Loading branch information
TomTheBear committed Dec 12, 2022
2 parents 88cf44d + 4f548f4 commit cbb06d8
Show file tree
Hide file tree
Showing 61 changed files with 988 additions and 229 deletions.
31 changes: 30 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@ build-arm8-perf:
tags:
- testcluster


check-event-files:
stage: .pre
tags:
Expand All @@ -104,6 +103,16 @@ check-event-files:
- test/check_data_files.py events
# - test/check_data_files.py groups

notify-github-pending:
stage: .pre
tags:
- testcluster
variables:
NO_SLURM_SUBMIT: 1
when: always
script:
- test/gitlab-ci/notify_github.sh pending

arch-gen:
stage: build
tags:
Expand Down Expand Up @@ -147,3 +156,23 @@ cuda-pipeline:
strategy: depend
variables:
PARENT_PIPELINE_ID: $CI_PIPELINE_ID

notify-github-success:
stage: .post
tags:
- testcluster
variables:
NO_SLURM_SUBMIT: 1
when: on_success
script:
- test/gitlab-ci/notify_github.sh success

notify-github-failure:
stage: .post
tags:
- testcluster
variables:
NO_SLURM_SUBMIT: 1
when: on_failure
script:
- test/gitlab-ci/notify_github.sh failure
23 changes: 23 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
# Changelog 5.2.2
- Add mutex to pinning library
- Fix pin string parsing in pinning library
- Make SBIN path configurable in build system
- Add PKGBUILD for ArchLinux package builds
- Remove accessDaemon double-fork in systemd environements
- Group updates for L2/L3 (mainly AMD Zen)
- Fix multi-initialization in MarkerAPI
- Add energy event scaling for Fujitsu A64FX
- Nvmon: Use Cupti error string to get better warning/error messages
- Nvmon: Store events internally to re-use event strings in stopCounters
- AccessLayer: Catch SIGCHLD to stop sending requests to accessDaemon if it was killed
- likwid-genTopoCfg: Update writing and reading of topology file
- Add INST_RETIRED_NOP event for Intel Icelake (desktop & server)
- Removed some memory leaks
- Improved checks for RDPMC availability
- Add TOPDOWN_SLOTS for perf_event
- Fix for systems with CPU sockets without hwthreads (A64FX FX1000)
- Fix if HOME environment variable is not set (systemd)
- Reader function for perf_event_paranoid in Lua to get state early
- likwid-mpirun: Sanitize np and ppn values to avoid crashes


# Changelog 5.2.1
- Add support for Intel Rocketlake and AMD Zen3 variant (Family 19, Model 0x50)
- Fix for perf_event multiplexing (important!)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ our hands to test them properly.

[LIKWID Playlist (YouTube)](https://www.youtube.com/playlist?list=PLxVedhmuwLq2CqJpAABDMbZG8Whi7pKsk)

[![Build Status](https://gitlab.rrze.fau.de/ub55yzis/likwid/badges/master/pipeline.svg)](https://gitlab.rrze.fau.de/ub55yzis/likwid/-/commits/master) [![General LIKWID DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4275676.svg)](https://doi.org/10.5281/zenodo.4275676)
[![Build Status](https://gitos.rrze.fau.de/ub55yzis/likwid/badges/master/pipeline.svg)](https://gitos.rrze.fau.de/ub55yzis/likwid/-/commits/master) [![General LIKWID DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4275676.svg)](https://doi.org/10.5281/zenodo.4275676)

It consists of:

Expand Down
67 changes: 67 additions & 0 deletions bench/armv8/peakflops_neon.ptt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
STREAMS 1
TYPE DOUBLE
FLOPS 28
BYTES 8
DESC Double-precision multiplications and additions with a single load, optimized for NEON FMAs
LOADS 1
STORES 0
INSTR_LOOP 29
UOPS 29
ldr q1, [STR0]
ldr q2, [STR0]
ldr q3, [STR0]
ldr q4, [STR0]
ldr q5, [STR0]
ldr q6, [STR0]
ldr q7, [STR0]
ldr q8, [STR0]
ldr q9, [STR0]
ldr q10, [STR0]
ldr q11, [STR0]
ldr q12, [STR0]
ldr q13, [STR0]
ldr q14, [STR0]
ldr q15, [STR0]
ldr q16, [STR0]
ldr q17, [STR0]
ldr q18, [STR0]
ldr q19, [STR0]
ldr q20, [STR0]
ldr q21, [STR0]
ldr q22, [STR0]
ldr q23, [STR0]
ldr q24, [STR0]
ldr q25, [STR0]
ldr q26, [STR0]
ldr q27, [STR0]
ldr q28, [STR0]
LOOP 2
ldr q16, [STR0], #8
fadd v1.2d, v1.2d, v1.2d
fadd v2.2d, v2.2d, v2.2d
fmul v3.2d, v3.2d, v3.2d
fmul v4.2d, v4.2d, v4.2d
fadd v5.2d, v5.2d, v5.2d
fadd v6.2d, v6.2d, v6.2d
fmul v7.2d, v7.2d, v7.2d
fmul v8.2d, v8.2d, v8.2d
fadd v9.2d, v9.2d, v9.2d
fadd v10.2d, v10.2d, v10.2d
fmul v11.2d, v11.2d, v11.2d
fmul v12.2d, v12.2d, v12.2d
fadd v13.2d, v13.2d, v13.2d
fadd v14.2d, v14.2d, v14.2d
fmul v15.2d, v15.2d, v15.2d
fmul v16.2d, v16.2d, v16.2d
fadd v17.2d, v17.2d, v17.2d
fadd v18.2d, v18.2d, v18.2d
fmul v19.2d, v19.2d, v19.2d
fmul v20.2d, v20.2d, v20.2d
fadd v21.2d, v21.2d, v21.2d
fadd v22.2d, v22.2d, v22.2d
fmul v23.2d, v23.2d, v23.2d
fmul v24.2d, v24.2d, v24.2d
fadd v25.2d, v25.2d, v25.2d
fadd v26.2d, v26.2d, v26.2d
fmul v27.2d, v27.2d, v27.2d
fmul v28.2d, v28.2d, v28.2d
67 changes: 67 additions & 0 deletions bench/armv8/peakflops_neon_fma.ptt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
STREAMS 1
TYPE DOUBLE
FLOPS 56
BYTES 8
DESC Double-precision multiplications and additions with a single load, optimized for NEON FMAs
LOADS 1
STORES 0
INSTR_LOOP 29
UOPS 29
ldr q1, [STR0]
ldr q2, [STR0]
ldr q3, [STR0]
ldr q4, [STR0]
ldr q5, [STR0]
ldr q6, [STR0]
ldr q7, [STR0]
ldr q8, [STR0]
ldr q9, [STR0]
ldr q10, [STR0]
ldr q11, [STR0]
ldr q12, [STR0]
ldr q13, [STR0]
ldr q14, [STR0]
ldr q15, [STR0]
ldr q16, [STR0]
ldr q17, [STR0]
ldr q18, [STR0]
ldr q19, [STR0]
ldr q20, [STR0]
ldr q21, [STR0]
ldr q22, [STR0]
ldr q23, [STR0]
ldr q24, [STR0]
ldr q25, [STR0]
ldr q26, [STR0]
ldr q27, [STR0]
ldr q28, [STR0]
LOOP 2
ldr q16, [STR0], #8
fmla v1.2d, v1.2d, v1.2d
fmla v2.2d, v2.2d, v2.2d
fmla v3.2d, v3.2d, v3.2d
fmla v4.2d, v4.2d, v4.2d
fmla v5.2d, v5.2d, v5.2d
fmla v6.2d, v6.2d, v6.2d
fmla v7.2d, v7.2d, v7.2d
fmla v8.2d, v8.2d, v8.2d
fmla v9.2d, v9.2d, v9.2d
fmla v10.2d, v10.2d, v10.2d
fmla v11.2d, v11.2d, v11.2d
fmla v12.2d, v12.2d, v12.2d
fmla v13.2d, v13.2d, v13.2d
fmla v14.2d, v14.2d, v14.2d
fmla v15.2d, v15.2d, v15.2d
fmla v16.2d, v16.2d, v16.2d
fmla v17.2d, v17.2d, v17.2d
fmla v18.2d, v18.2d, v18.2d
fmla v19.2d, v19.2d, v19.2d
fmla v20.2d, v20.2d, v20.2d
fmla v21.2d, v21.2d, v21.2d
fmla v22.2d, v22.2d, v22.2d
fmla v23.2d, v23.2d, v23.2d
fmla v24.2d, v24.2d, v24.2d
fmla v25.2d, v25.2d, v25.2d
fmla v26.2d, v26.2d, v26.2d
fmla v27.2d, v27.2d, v27.2d
fmla v28.2d, v28.2d, v28.2d
67 changes: 67 additions & 0 deletions bench/armv8/peakflops_sp_neon.ptt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
STREAMS 1
TYPE SINGLE
FLOPS 28
BYTES 4
DESC Single-precision multiplications and additions with a single load, optimized for NEON FMAs
LOADS 1
STORES 0
INSTR_LOOP 29
UOPS 29
ldr q1, [STR0]
ldr q2, [STR0]
ldr q3, [STR0]
ldr q4, [STR0]
ldr q5, [STR0]
ldr q6, [STR0]
ldr q7, [STR0]
ldr q8, [STR0]
ldr q9, [STR0]
ldr q10, [STR0]
ldr q11, [STR0]
ldr q12, [STR0]
ldr q13, [STR0]
ldr q14, [STR0]
ldr q15, [STR0]
ldr q16, [STR0]
ldr q17, [STR0]
ldr q18, [STR0]
ldr q19, [STR0]
ldr q20, [STR0]
ldr q21, [STR0]
ldr q22, [STR0]
ldr q23, [STR0]
ldr q24, [STR0]
ldr q25, [STR0]
ldr q26, [STR0]
ldr q27, [STR0]
ldr q28, [STR0]
LOOP 4
ldr q16, [STR0], #8
fadd v1.4s, v1.4s, v1.4s
fadd v2.4s, v2.4s, v2.4s
fmul v3.4s, v3.4s, v3.4s
fmul v4.4s, v4.4s, v4.4s
fadd v5.4s, v5.4s, v5.4s
fadd v6.4s, v6.4s, v6.4s
fmul v7.4s, v7.4s, v7.4s
fmul v8.4s, v8.4s, v8.4s
fadd v9.4s, v9.4s, v9.4s
fadd v10.4s, v10.4s, v10.4s
fmul v11.4s, v11.4s, v11.4s
fmul v12.4s, v12.4s, v12.4s
fadd v13.4s, v13.4s, v13.4s
fadd v14.4s, v14.4s, v14.4s
fmul v15.4s, v15.4s, v15.4s
fmul v16.4s, v16.4s, v16.4s
fadd v17.4s, v17.4s, v17.4s
fadd v18.4s, v18.4s, v18.4s
fmul v19.4s, v19.4s, v19.4s
fmul v20.4s, v20.4s, v20.4s
fadd v21.4s, v21.4s, v21.4s
fadd v22.4s, v22.4s, v22.4s
fmul v23.4s, v23.4s, v23.4s
fmul v24.4s, v24.4s, v24.4s
fadd v25.4s, v25.4s, v25.4s
fadd v26.4s, v26.4s, v26.4s
fmul v27.4s, v27.4s, v27.4s
fmul v28.4s, v28.4s, v28.4s
67 changes: 67 additions & 0 deletions bench/armv8/peakflops_sp_neon_fma.ptt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
STREAMS 1
TYPE SINGLE
FLOPS 56
BYTES 4
DESC Single-precision multiplications and additions with a single load, optimized for NEON FMAs
LOADS 1
STORES 0
INSTR_LOOP 29
UOPS 29
ldr q1, [STR0]
ldr q2, [STR0]
ldr q3, [STR0]
ldr q4, [STR0]
ldr q5, [STR0]
ldr q6, [STR0]
ldr q7, [STR0]
ldr q8, [STR0]
ldr q9, [STR0]
ldr q10, [STR0]
ldr q11, [STR0]
ldr q12, [STR0]
ldr q13, [STR0]
ldr q14, [STR0]
ldr q15, [STR0]
ldr q16, [STR0]
ldr q17, [STR0]
ldr q18, [STR0]
ldr q19, [STR0]
ldr q20, [STR0]
ldr q21, [STR0]
ldr q22, [STR0]
ldr q23, [STR0]
ldr q24, [STR0]
ldr q25, [STR0]
ldr q26, [STR0]
ldr q27, [STR0]
ldr q28, [STR0]
LOOP 4
ldr q16, [STR0], #8
fmla v1.4s, v1.4s, v1.4s
fmla v2.4s, v2.4s, v2.4s
fmla v3.4s, v3.4s, v3.4s
fmla v4.4s, v4.4s, v4.4s
fmla v5.4s, v5.4s, v5.4s
fmla v6.4s, v6.4s, v6.4s
fmla v7.4s, v7.4s, v7.4s
fmla v8.4s, v8.4s, v8.4s
fmla v9.4s, v9.4s, v9.4s
fmla v10.4s, v10.4s, v10.4s
fmla v11.4s, v11.4s, v11.4s
fmla v12.4s, v12.4s, v12.4s
fmla v13.4s, v13.4s, v13.4s
fmla v14.4s, v14.4s, v14.4s
fmla v15.4s, v15.4s, v15.4s
fmla v16.4s, v16.4s, v16.4s
fmla v17.4s, v17.4s, v17.4s
fmla v18.4s, v18.4s, v18.4s
fmla v19.4s, v19.4s, v19.4s
fmla v20.4s, v20.4s, v20.4s
fmla v21.4s, v21.4s, v21.4s
fmla v22.4s, v22.4s, v22.4s
fmla v23.4s, v23.4s, v23.4s
fmla v24.4s, v24.4s, v24.4s
fmla v25.4s, v25.4s, v25.4s
fmla v26.4s, v26.4s, v26.4s
fmla v27.4s, v27.4s, v27.4s
fmla v28.4s, v28.4s, v28.4s
4 changes: 2 additions & 2 deletions bench/likwid-bench.c
Original file line number Diff line number Diff line change
Expand Up @@ -491,10 +491,10 @@ int main(int argc, char** argv)
if ((int)(floor(orig_size/currentWorkgroup->numberOfThreads)) % test->stride)
{
int typesize = allocator_dataTypeLength(test->type);
newsize = (((int)(floor(orig_size/nrThreads))/stride)*(stride))*nrThreads;
newsize = (((size_t)(floor(orig_size/nrThreads))/stride)*(stride))*nrThreads;
if (newsize > 0 && warn_once)
{
fprintf (stdout, "Warning: Sanitizing vector length to a multiple of the loop stride %d and thread count %d from %d elements (%d bytes) to %d elements (%d bytes)\n",stride, nrThreads, orig_size, orig_size*typesize, newsize, newsize*typesize);
fprintf (stdout, "Warning: Sanitizing vector length to a multiple of the loop stride %d and thread count %d from %ld elements (%ld bytes) to %ld elements (%ld bytes)\n",stride, nrThreads, orig_size, orig_size*typesize, newsize, newsize*typesize);
warn_once = 0;
}
else if (newsize == 0)
Expand Down
2 changes: 1 addition & 1 deletion bench/src/strUtil.c
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ parse_workgroup(Workgroup* group, const_bstring str, DataType type)
}
else
{
fprintf(stderr, "Unknown affinity domain %s\n", bdata(tokens->entry[2]));
fprintf(stderr, "Unknown affinity domain %s\n", bdata(tokens->entry[0]));
bstrListDestroy(tokens);
return NULL;
}
Expand Down
3 changes: 2 additions & 1 deletion groups/CLX/ENERGY.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@ TMP0 TEMP_CORE
PWR0 PWR_PKG_ENERGY
PWR1 PWR_PP0_ENERGY
PWR3 PWR_DRAM_ENERGY

UBOXFIX UNCORE_CLOCK


METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock
Uncore Clock [MHz] 1.E-06*UBOXFIX/time
CPI FIXC1/FIXC0
Temperature [C] TMP0
Energy [J] PWR0
Expand Down
Loading

0 comments on commit cbb06d8

Please sign in to comment.