Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Cuda 9.0 #13

Open
uvilla opened this issue Jan 22, 2020 · 2 comments
Open

Issues with Cuda 9.0 #13

uvilla opened this issue Jan 22, 2020 · 2 comments

Comments

@uvilla
Copy link

uvilla commented Jan 22, 2020

Hi there,

I am having a very strange issue running the CUDA variant of LULESH (release of 2.0.2).

I'm compiling using Cuda compilation tools, release 9.0, V9.0.176 and setting either the flag -arch=sm_35 or, to avoid compilation warnings, the flag -arch=sm_70.

When running the code on a Tesla V100-SXM2-32GB, the program crash as follows:

$ ./lulesh -s 10
Host compute1-exec-206.ris.wustl.edu using GPU 0: Tesla V100-SXM2-32GB
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: invalid argument
[compute1-exec-206:00204] *** Process received signal ***
[compute1-exec-206:00204] Signal: Aborted (6)
[compute1-exec-206:00204] Signal code:  (-6)
[compute1-exec-206:00204] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f7e38bda390]
[compute1-exec-206:00204] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f7e37d8f428]
[compute1-exec-206:00204] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f7e37d9102a]
[compute1-exec-206:00204] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x16d)[0x7f7e386d284d]
[compute1-exec-206:00204] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6)[0x7f7e386d06b6]
[compute1-exec-206:00204] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701)[0x7f7e386d0701]
[compute1-exec-206:00204] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919)[0x7f7e386d0919]
[compute1-exec-206:00204] [ 7] ./lulesh[0x41f252]
[compute1-exec-206:00204] [ 8] ./lulesh[0x417330]
[compute1-exec-206:00204] [ 9] ./lulesh[0x41ade5]
[compute1-exec-206:00204] [10] ./lulesh[0x405cff]
[compute1-exec-206:00204] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7e37d7a830]
[compute1-exec-206:00204] [12] ./lulesh[0x409bf9]
[compute1-exec-206:00204] *** End of error message ***
Aborted (core dumped)

As anyone else observed or reported something similar?
What version of CUDA do you usually use to compile LULESH?

Thank you in advance,

Umberto

@miharulidze
Copy link

@uvilla I may be a little bit late here, but I observed exactly the same problem on exactly the same hardware (V100-SXM2).

It looks like this code is not maintained for a very long time.

Good news are that disabling MPI support in Makefile helps.

@ikarlin
Copy link
Collaborator

ikarlin commented May 14, 2020

Yes the CUDA version is not maintained. It was a Nvidia port. and Nvidia has not been updating it. The mainline code is maintained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants