You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
likwid-pin appears to silently fail when using more than one thread, judging by the fact that the command exits almost immediately, and nothing is written to standard output.
LIKWID version and download source (Github, FTP, package manger, ...): likwid-pin -- Version 5.3.0 (commit: 0123456789)
Operating system: Linux maxwell 5.15.0-100-generic #110~20.04.1-Ubuntu SMP
Does your application use libraries like MPI, OpenMP or Pthreads? Yes, OpenMP.
Are you using the MarkerAPI (CPU code instrumentation)? No.
To Reproduce with a LIKWID command
Please supply the output of the command with -V 3 added to the command:
(base) ivan@maxwell:~/lrz/rbfxlbm/build$ likwid-pin -V 3 -c 0,1 ./albm
DEBUG - [hwloc_init_cpuInfo:359] HWLOC CpuInfo Family 6 Model 167 Stepping 1 Vendor 0x0 Part 0x0 isIntel 1 numHWThreads 16 activeHWThreads 16
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 0 Thread 0 Core 0 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 8 Thread 1 Core 0 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 1 Thread 0 Core 1 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 9 Thread 1 Core 1 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 2 Thread 0 Core 2 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 10 Thread 1 Core 2 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 3 Thread 0 Core 3 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 11 Thread 1 Core 3 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 4 Thread 0 Core 4 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 12 Thread 1 Core 4 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 5 Thread 0 Core 5 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 13 Thread 1 Core 5 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 6 Thread 0 Core 6 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 14 Thread 1 Core 6 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 7 Thread 0 Core 7 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 15 Thread 1 Core 7 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_cacheTopology:798] HWLOC Cache Pool ID 0 Level 1 Size 49152 Threads 2
DEBUG - [hwloc_init_cacheTopology:798] HWLOC Cache Pool ID 1 Level 2 Size 524288 Threads 2
DEBUG - [hwloc_init_cacheTopology:798] HWLOC Cache Pool ID 2 Level 3 Size 16777216 Threads 16
DEBUG - [affinity_init:547] Affinity: Socket domains 1
DEBUG - [affinity_init:549] Affinity: CPU die domains 1
DEBUG - [affinity_init:554] Affinity: CPU cores per LLC 8
DEBUG - [affinity_init:557] Affinity: Cache domains 1
DEBUG - [affinity_init:561] Affinity: NUMA domains 1
DEBUG - [affinity_init:562] Affinity: All domains 5
DEBUG - [affinity_addNodeDomain:370] Affinity domain N: 16 HW threads on 8 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S0: 16 HW threads on 8 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D0: 16 HW threads on 8 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 16 HW threads on 8 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M0: 16 HW threads on 8 cores
DEBUG - [create_lookups:290] T 0 T2C 0 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 1 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 2 T2C 2 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 3 T2C 3 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 4 T2C 4 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 5 T2C 5 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 6 T2C 6 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 7 T2C 7 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 8 T2C 0 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 9 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 10 T2C 2 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 11 T2C 3 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 12 T2C 4 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 13 T2C 5 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 14 T2C 6 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:290] T 15 T2C 7 T2S 0 T2D 0 T2LLC 0 T2M 0
Evaluated CPU string to CPUs: 0,1
Running: ./albm
Using 2 thread(s) (cpuset: 0x3)
In contrast with a single thread I get:
...
Evaluated CPU string to CPUs: 0
[likwid-pin] Main PID -> hwthread 0 - OK
Running: ./albm
Using 1 thread(s) (cpuset: 0x1)
num_steps = 1000
tau / dt ratio = 2.0000000E-02
CFL = 0.6270693
U0 = 1.1547005E-02
Mach = 2.0000000E-02
Re = 1000.000
Everything okay
51486 1081185
In assembly routine:
n = 51485
nnz = 1081185
rownnz_max = 21
rhs_max = 9
Attempting to allocate memory
n = 51485 , nz = 21 , q = 9
sysclock (s) 3.43853497505188
mlups 14.9729455041758
ompwtime (s) 3.43853306770325
mlups 14.9729538096440
Total time (s) 3.43853306770325
Collision time ratio 1.559326410374301E-002
Streaming time ratio 0.984065665745844
If I run the application directly, it works as expected:
(base) ivan@maxwell:~/lrz/rbfxlbm/build$ OMP_NUM_THREADS=2 ./albm
num_steps = 1000
tau / dt ratio = 2.0000000E-02
CFL = 0.6270693
U0 = 1.1547005E-02
Mach = 2.0000000E-02
Re = 1000.000
Everything okay
51486 1081185
In assembly routine:
n = 51485
nnz = 1081185
rownnz_max = 21
rhs_max = 9
Attempting to allocate memory
n = 51485 , nz = 21 , q = 9
sysclock (s) 1.81032705307007
mlups 28.4396107920625
ompwtime (s) 1.81032490730286
mlups 28.4396445013620
Total time (s) 1.81032490730286
Collision time ratio 1.993282543925349E-002
Streaming time ratio 0.979440022346742
The text was updated successfully, but these errors were encountered:
Thanks for reporting. I never seen such a behavior.
Does it work with other applications and multiple threads? Are you using some computing library like TBB, Cilk+, SYCL, ...? If it is OpenMP, is it one of the common implementations (GCC, LLVM, Intel)?
I was only testing GCC and Intel compilers. Potentially TBB via MKL Sparse BLAS, but I'd need to double check this. I'll try again with a simpler application.
Thanks for your response. If you used OpenMP (GCC or Intel), we should try to find the error. My question regarding threading solutions like TBB, Cilk+ or SYCL was just to ensure we are not talking about something exotic.
Can you please try the following with your failing code:
# Rebuild LIKWID with DEBUG=true
$ cd likwid-src
$ make distclean
$ make PREFIX=$LIKWID_INSTALL_DIR DEBUG=true
$ make PREFIX=$LIKWID_INSTALL_DIR DEBUG=true install
$ gdb $LIKWID_INSTALL_DIR/bin/likwid-lua
gdb > run $LIKWID_INSTALL_DIR/bin/likwid-pin -V 3 -c 0,1 ./albm
<fails>
gdb > backtrace
With this, I should be able to locate the exact error.
Describe the bug
likwid-pin appears to silently fail when using more than one thread, judging by the fact that the command exits almost immediately, and nothing is written to standard output.
To Reproduce
LIKWID command and/or API usage:
$ likwid-pin -V 2 -c 0,1 ./albm
LIKWID version and download source (Github, FTP, package manger, ...):
likwid-pin -- Version 5.3.0 (commit: 0123456789)
Operating system:
Linux maxwell 5.15.0-100-generic #110~20.04.1-Ubuntu SMP
Does your application use libraries like MPI, OpenMP or Pthreads? Yes, OpenMP.
Are you using the MarkerAPI (CPU code instrumentation)? No.
To Reproduce with a LIKWID command
Please supply the output of the command with
-V 3
added to the command:In contrast with a single thread I get:
If I run the application directly, it works as expected:
The text was updated successfully, but these errors were encountered: