Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometry optimization fails in multi-GPU version #281

Open
Madu86 opened this issue Feb 19, 2023 · 6 comments
Open

Geometry optimization fails in multi-GPU version #281

Madu86 opened this issue Feb 19, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@Madu86
Copy link
Collaborator

Madu86 commented Feb 19, 2023

The geometry optimization in multi-GPU version of the latest version fails for some reason. See the attached .zip file for CUDA serial and multi-GPU output files of a test case.
1077.out.zip

@Madu86 Madu86 added the bug Something isn't working label Feb 19, 2023
@akhilshajan
Copy link
Collaborator

akhilshajan commented Feb 23, 2023

Hi @Madu86, I tried out the calculations with quick.MPI and the problem still persist. You can find the outputs from all cuda, cuda.MPI and MPI calculations here.

@Madu86
Copy link
Collaborator Author

Madu86 commented Mar 23, 2023

Hi @akhilshajan, can you please provide me a smaller example that can be used to reproduce this issue? The one I have (attached above) is too big for debugging.

@akhilshajan
Copy link
Collaborator

Hi @Madu86, I have tried out few examples which takes ~40 iterations with serial, MPI and cuda.MPI and it works out fine giving same results. I was not able to find something that would be helpful for debugging. I am still working on few other molecules there is some discrepancy I will update you.

@Madu86
Copy link
Collaborator Author

Madu86 commented Mar 28, 2023

@akhilshajan Any update on this?

@akhilshajan
Copy link
Collaborator

Hi @Madu86, I apologize for my delayed response. I have tried out some molecules and it appears that the issue we are experiencing with MPI arises when using the D3BJ keyword. I did not encounter any errors when running the calculations without this keyword. Attached is the input file I used to test benzene molecule, where the SCF calculation failed.

@akhilshajan
Copy link
Collaborator

Hi @Madu86, I tried to run this test case with modifications to the MPI we made for DL-Find just to confirm if the cause was with MPI. I see some discrepancies in the results as I ran this system on single CPU(took 53 iterations), CUDA(122 iterations) and multi-GPU(still fails!!). I have attached my results including the old results shared by you. I have also attached the slurm out file for multi-CUDA calculation.

1077.out.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants