Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative FOM when Running Large MPI Counts #12

Open
nmhamster opened this issue Mar 27, 2019 · 5 comments
Open

Negative FOM when Running Large MPI Counts #12

nmhamster opened this issue Mar 27, 2019 · 5 comments

Comments

@nmhamster
Copy link
Contributor

We are scaling LULESH to large numbers of nodes (around 2000 nodes) with 8 MPI ranks per node and a problem size of 90. The result is that FOMs given go negative.

Elapsed time         =      55.50 (s)
Grind time (us/z/c)  = 0.38064213 (per dom)  (-0.00015125329 overall)
FOM                  = -6611426.6 (z/s)

This probably shouldn't happen.

@ikarlin
Copy link
Collaborator

ikarlin commented Mar 27, 2019

@nmhamster are you using master? If so I'm not sure how this output makes sense since the overall number and the elapsed time are the same value being printed out in the current code. The FOM is calculated off these, but its not clear where the bug would come from.

Note there was a bug related to overflow in an older version of LULESH. If this is what you are hitting then I can help you fix this if you need to use that version for some reason.

If you are using master can you give me inputs you are running and the full output. I can try and recreate if I can get access to a big enough resource quickly or try and recreate on a smaller node count with the same global problem size.

@nmhamster
Copy link
Contributor Author

OK, I am using the 2.0.3 release. I changed the output code at the end and it corrects the FOM (but note - I didn't change the number of elements print out earlier which is also overflowing.

Fixed output example:

Total number of elements: -1838665592

To run other sizes, use -s <integer>.
To run a fixed number of iterations, use -i <integer>.
To run a more or less balanced region set, use -b <integer>.
To change the relative costs of regions, use -c <integer>.
To print out progress, use -p
To write an output file for VisIt, use -v
See help (-h) for more options

Run completed:
   Problem size        =  90
   MPI tasks           =  9261
   Iteration count     =  200
   Final Origin Energy = 2.026863e+11
   Testing Plane 0 of Energy Array on rank 0:
        MaxAbsDiff   = 4.196167e-05
        TotalAbsDiff = 2.186766e-04
        MaxRelDiff   = 1.140498e-10


Elapsed time         =     105.67 (s)
Grind time (us/z/c)  = 0.72475375 (per dom)  (7.8258693e-05 overall)
FOM                  =   12778133 (z/s)

Changed the grindTime2 calculation to the following:

   Real_t local_grid = nx*nx*nx;
   Real_t local_grid_ranks = local_grid * (Real_t) numRanks;

   Real_t grindTime1 = ((elapsed_time*1e6)/locDom.cycle())/(nx*nx*nx);
   Real_t grindTime2 = ((elapsed_time*1e6)/locDom.cycle())/(local_grid_ranks);

@nmhamster
Copy link
Contributor Author

I can submit a pull request as I get these fixed.

@ikarlin
Copy link
Collaborator

ikarlin commented Mar 27, 2019

@nmhamster these are fixed in master. Do you need a tagged release? If so I think the easiest solution would be for me to tag a new release since there are no open issues and you to move there.

I'm open to other ideas, but its not worth your time fixing this.

@ikarlin
Copy link
Collaborator

ikarlin commented Mar 28, 2019

@nmhamster after some thought I think a new tagged release is overdue. I need to do a bit of performance testing to confirm no significant regression, but otherwise the code should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants