Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Darcy Flow Config Issues #48

Open
arthurfeeney opened this issue Sep 6, 2023 · 1 comment
Open

Darcy Flow Config Issues #48

arthurfeeney opened this issue Sep 6, 2023 · 1 comment

Comments

@arthurfeeney
Copy link
Contributor

arthurfeeney commented Sep 6, 2023

I have run into two issues with darcy flow's config files:

  1. There are two config files config_darcy.yaml and args/config_Darcy.yaml. The documentation points to config_Darcy.yaml (capital D), but config_darcy.yaml (lowercase d) seems newer and more correct...? Should this be updated to fully replace the old one?

  2. config_darcy.yaml works with FNO, but has an error with Unet. By default, config_darcy sets initial_step=1 and t_train=1. I believe this is an error because the AR loops (here and here) go from initial_step to t_train, so it ends up not doing anything, since the range ends up being empty. This actually produces a confusing error, since the loss is initialized as a python int. Since the loop is empty, nothing is added onto loss, so it stays as an int:

Unet
Epochs = 500, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
Spatial Dimension 2
Total parameters = 7762465
start training...
Error executing job with overrides: []
Traceback (most recent call last):
  File "/dfs6/pub/afeeney/opensource/PDEBench/pdebench/models/train_models_forward.py", line 199, in main
    run_training_Unet(
  File "/data/homezvol2/afeeney/.conda/envs/pdebench/lib/python3.10/site-packages/pdebench/models/unet/train.py", line 414, in run_training
    train_l2_step += loss.item()
AttributeError: 'int' object has no attribute 'item'

I was able to get it running by setting t_train=2. I don't totally follow how the Darcy stuff is setup, so I'm not sure if that's a correct fix though...

@qwerfdsadad
Copy link

Have you successfully fixed this bug? For both FNO and Unet, I can't successfully run, and encountered the same problem as you. The dataset used comes from folder data_download.

FNO

FNO
Epochs = 30, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
FNODatasetSingle
/home/dp/miniconda3/envs/pdebench/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /home/builder/cbouss/pytorch/croot/pytorch_1685629640362/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Spatial Dimension 2
Total parameters = 465557
Error executing job with overrides: ['+args=config_Darcy.yaml', '++args.filename=2D_DarcyFlow_beta10.0_Train.hdf5', '++args.model_name=FNO']
Traceback (most recent call last):
  File "/home/dp/PDEBench/pdebench/models/train_models_forward.py", line 166, in main
    run_training_FNO(
  File "/home/dp/miniconda3/envs/pdebench/lib/python3.9/site-packages/pdebench/models/fno/train.py", line 227, in run_training
    train_l2_step += loss.item()
AttributeError: 'int' object has no attribute 'item'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Unet

Unet
Epochs = 30, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
Spatial Dimension 2
Total parameters = 7765057
start training...
Error executing job with overrides: ['+args=config_Darcy.yaml', '++args.filename=2D_DarcyFlow_beta10.0_Train.hdf5', '++args.model_name=Unet']
Traceback (most recent call last):
  File "/home/dp/PDEBench/pdebench/models/train_models_forward.py", line 200, in main
    run_training_Unet(
  File "/home/dp/miniconda3/envs/pdebench/lib/python3.9/site-packages/pdebench/models/unet/train.py", line 414, in run_training
    train_l2_step += loss.item()
AttributeError: 'int' object has no attribute 'item'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants