Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PETSc] - Problem with _jll binaries after 3.15.2 #7242

Open
JordiManyer opened this issue Aug 23, 2023 · 6 comments
Open

[PETSc] - Problem with _jll binaries after 3.15.2 #7242

JordiManyer opened this issue Aug 23, 2023 · 6 comments

Comments

@JordiManyer
Copy link

Hi all,

I am a developer for GridapPETSc.jl, a package bridging PETSc_jll for the Gridap ecosystem.

While doing some non-related maintenance, we noticed our tests stopped working for PETSC_jll versions above v3.15.2. We also run our tests with manually compiled versions of PETSc, which run fine for all versions up to v3.19.4. Therefore this would point to a bug within the build of the _jll package. Since I am not an expert on artifact building in Julia, I would like some help on the matter.

Here is the issue and the PR where I am exploring this matter.
As you can see in the PR, tests run fine for manually compiled PETSc (i.e the CI_EXTRA jobs) but fail for the _jll tests (i.e the CI jobs) which uses the latest PETSc_jll v3.18.6. The latter issue is solved if using PETSc_jll v3.13.x or v3.15.x instead, and starts failing for newer versions.

@boriskaus I understand you have been taking care of most of the releases for PETSc_jll (thank you, btw). Would you be able to have a look at this?

@boriskaus
Copy link
Contributor

ai, that is going to be a tricky one to find. It seems to crash in the middle of a computation.
One thing I found with our PETSc-based codes is that the multithreading of the Julia BLAS libraries causes problems/crashes (and makes the calculations very slow). This is activated by default; you can switch that off by setting the environmental variable OMP_NUM_THREADS=1, as done here and here.
How is that dealt with in GridapPETSc.jl?

@JordiManyer
Copy link
Author

JordiManyer commented Aug 24, 2023

How is that dealt with in GridapPETSc.jl?

I believe it just isn't. I may look into it, although we compile our own petsc libraries for all our important runs. However, I've done a couple of tests by manually setting the environment variable and it does not seem to be what causes the crash...

@boriskaus
Copy link
Contributor

I doubt that you use multithreaded blas for your local PETSc build.
The fact that your tests hang for several hours as well as that it only occurs in parallel is all consistent with this.

@ViralBShah
Copy link
Member

You probably want to use BLAS.set_num_threads(1) to disable OpenBLAS threading, and avoid its threading issues.

@amartinhuertas
Copy link

Hi ! any update on this?

@boriskaus
Copy link
Contributor

Not from my side; I'm more than to receive help in compiling new versions of PETSc_jll, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants