Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve build speed for stdlib #908

Open
jalvesz opened this issue Dec 12, 2024 · 2 comments
Open

Improve build speed for stdlib #908

jalvesz opened this issue Dec 12, 2024 · 2 comments
Assignees
Labels
build: cmake Issue with stdlib's CMake build files

Comments

@jalvesz
Copy link
Contributor

jalvesz commented Dec 12, 2024

Ever since adding BLAS/LAPACK to stdlib the build times have exploded.

I have observed in my local machine (Windows + gfortran + CMake + Ninja with 16 threads) that the build takes 199 seconds. Using this tool https://ui.perfetto.dev/ as suggested by @ivan-pi it is possible to load the .ninja_log file and observe where are the bottle necks:
Image

In the figure we observe the build chocking up at the lapack per kind files.

I started a major refactoring of stdlib's BLAS/LAPACK here with help from @ivan-pi and @perazz : https://github.com/jalvesz/stdlib/tree/split_lapack using a python script to rewrite the blas/lapack-per-kind files into submodule files per groups.

Doing that I managed to get down the build time to 38 seconds (x5 faster):

Image

There are still some spots in which the build stalls but it is quite evident that by splitting into several submodule files enables better parallel build.

Before opening a PR, I would like to gather some broad opinions on this refactoring.

Some observations/open questions:

  1. In order to achieve this splitting, the refactor_blaslapack_subm.py script contains two dictionaries blas_groups and lapack_groups. They define the procedures to be included in each submodule file. This splitting started by following the lapack reference manual, then I had to arbitrarily split further. I'm sure that with a proper dependency graph it could be possible to achieve a better grouping. If anyone feels like proposing a mechanism to find such grouping that would be very nice. Here a couple of references I've found interesting about such topic: https://paulcavallaro.com/blog/optimizing-function-placement-with-call-chain-clustering/ & https://medium.com/@avidaneran/using-a-graph-representation-to-analyze-python-dependencies-a57cd681fa09

  2. In the draft branch, the per-kind files were moved to a legacy folder. When opening the PR I think these should not be there as there is no point in having so many duplicates of the sources. Nonetheless, the splitting script can be improved and it is based on those files. Should these files be saved in a separate repo for reference?

  3. The submodules approach is proving very neat to organize and increase build performance. Now, this implies a very large header module file in order to have all interfaces for lapack functions. A better work on (1) could probably lead to finding clearer macro dependencies which could enable splitting also this header module.

  4. While performing this refactoring I noticed that there are some procedures which I don't understand why should they be kept such as lamc3, which simply does c = a + b ... is there a reason to actually keep such a function?

cc: @perazz @jvdp1 @fortran-lang/stdlib

@jalvesz jalvesz self-assigned this Dec 13, 2024
@jalvesz jalvesz added the build: cmake Issue with stdlib's CMake build files label Dec 13, 2024
@perazz
Copy link
Contributor

perazz commented Dec 14, 2024

Thank you @jalvesz @ivan-pi, this would be a nice qol upgrade for developers that build the library often (such as myself lol). I support this feature. I believe the 64-bit version should be also taken into account, as on-going at #888.

@ivan-pi
Copy link
Member

ivan-pi commented Dec 26, 2024

These are very illuminating findings. When it comes to stdlib, I think we need to keep in mind that:

  • (public) modules should remain stable if possible, and only expose as much as needed
  • sub-modules are essentially just implementation details

Currently, the modules provided are stdlib_blas and stdlib_lapack. Since BLAS is small, I think it is okay to provide all procedures. The Reference LAPACK on the other hand has about ~2000 procedures. If possible, I would only make public the driver and computational routines. Auxiliary routines routines would remain an implementation detail, or can be made public upon request.

I think it is okay to have stdlib_lapack export all drivers and computational routine interfaces. To address problem 3, we can always split into smaller modules later, in a backward compatible way.

! current way
module stdlib_lapack
   interface
      ! ... all the drivers and computational procedures ...
      ! ... but not auxiliary routines ...
   end interface
end module

In the future this can be split by topic, e.g.

module stdlib_lapack_lu
   interface
      ! ... LU-related procedures ...
   end interface
end module

module stdlib_lapack
   use stdlib_lapack_lu
   ! ...

   interface
     ! ... procedures which fit no where else
   end interface
end module

I think the auxiliary routines should remain private, because they are not described in the LAPACK user's guide. We do not want to expose more public API surface than necessary.

Concerning remark 4 in your list, I think sometimes such procedure are introduced to counteract unwanted compiler optimizations, say by hiding an expression behind an external procedure, to make sure it doesn't get optimized away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build: cmake Issue with stdlib's CMake build files
Projects
None yet
Development

No branches or pull requests

3 participants