Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it correct to set up fsdp for a machine (V100) that does not support bf16? #274

Open
xmc-andy opened this issue Sep 14, 2023 · 6 comments

Comments

@xmc-andy
Copy link

compute_environment: LOCAL_MACHINE
distributed_type: no
downcast_bf16: false
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
main_process_port: 20687

@Luodian
Copy link
Owner

Luodian commented Sep 14, 2023

yes it seems correct!

@xmc-andy
Copy link
Author

xmc-andy commented Sep 14, 2023 via email

@Luodian
Copy link
Owner

Luodian commented Sep 14, 2023

I think you can refer to this link to see if you can do something.

https://github.com/huggingface/accelerate/blob/6b3e559926afc4b9a127eb7762fc523ea0ea656a/src/accelerate/big_modeling.py#L514

I know that you may able to set device_map=balanced_low_0 to decreased GPU usage on rank 0 (since rank0 will do gather operations and sometimes other params will be shifted to rank 0 so induce to OOM).

@Luodian
Copy link
Owner

Luodian commented Sep 14, 2023

Previously I see some code doing so but I didnt use it before, maybe you should do some search on device_map mechanism and how to set it. And we are welcome that you could update your experience to us to help more users tackle the problem on V100 GPU~

@xmc-andy
Copy link
Author

Thank u for your shared suggestions, I will try them,

@xmc-andy
Copy link
Author

I tried setting device_map to 'auto', 'balanced', 'balanced_low_0' or 'sequential' respectively. Unfortunately, it still overflows the memory on 3 V100s (unfrozen ViT). In comparison, I think balanced_low_0 is It might be possible if I have enough cards, I will try it further if I have 4 V100s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants