Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛[BUG]: Modulus hangs on FNO training #152

Open
gioviciconte opened this issue May 7, 2024 · 0 comments
Open

🐛[BUG]: Modulus hangs on FNO training #152

gioviciconte opened this issue May 7, 2024 · 0 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working external Issues/PR filed by people outside the core team

Comments

@gioviciconte
Copy link

gioviciconte commented May 7, 2024

Version

1.4.0

On which installation method(s) does this occur?

Pip

Describe the issue

I have adapted the FNO Darcy example to train an FNO on a shockTube example. The problem is that modulus hangs, after the .solve() method is called. The only output I see is

python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[14:13:14] - JitManager: {'_enabled': False, '_arch_mode': <JitArchMode.ONLY_ACTIVATION: 1>, '_use_nvfuser': True, '_autograd_nodes': False}
[14:13:14] - GraphManager: {'_func_arch': False, '_debug': False, '_func_arch_allow_partial_hessian': True}
[14:13:17] - attempting to restore from: outputs/shockTube_FNO_lazy
[14:13:17] - optimizer checkpoint not found
[14:13:17] - model fno.0.pth not found

and nothing else, no errors.
The case is attached : shockTube_FNO.zip

Minimum reproducible example

"The case is attached in the issue"

Relevant log output

python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
[14:13:14] - JitManager: {'_enabled': False, '_arch_mode': <JitArchMode.ONLY_ACTIVATION: 1>, '_use_nvfuser': True, '_autograd_nodes': False}
[14:13:14] - GraphManager: {'_func_arch': False, '_debug': False, '_func_arch_allow_partial_hessian': True}
[14:13:17] - attempting to restore from: outputs/shockTube_FNO_lazy
[14:13:17] - optimizer checkpoint not found
[14:13:17] - model fno.0.pth not found

Environment details

No response

Other/Misc.

No response

@gioviciconte gioviciconte added ? - Needs Triage Need team to review and classify bug Something isn't working labels May 7, 2024
@prem-krishnan prem-krishnan added the external Issues/PR filed by people outside the core team label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working external Issues/PR filed by people outside the core team
Projects
None yet
Development

No branches or pull requests

2 participants