[BUG] Cannot replace pytorch.checkpoint with deepspeed.runtime.activation_checkpointing.checkpointing in accelerate #5550

vkaul11 · 2024-05-20T14:42:11Z

Describe the bug
I wanted to use deepspeed activation checkpointing parameters

"activation_checkpointing": {
"partition_activations": true,
"contiguous_memory_optimization": true,
"cpu_checkpointing": true
},
in my accelerate job but I couldn't see those being used and in fact whether I used gradient_checkpointing parameter in my model config to be true or false I still use up same memory. That is why I wanted to try activation checkpointing from deepseed to avoid OOM errors. I followed the idea here huggingface/accelerate#2160 and also mentioned in deepspeed documentation here https://github.com/huggingface/transformers/blob/92d1d97c05a01160d6e7fcf4198e93bf2cec0dfe/docs/source/en/deepspeed.md#L4
See the number 2 point of replacing the torch.utils.checkpoint with the activation checkpoint in the doc.

Activation/gradient checkpointing
Activation and gradient checkpointing trades speed for more GPU memory which allows you to overcome scenarios where your GPU is out of memory or to increase your batch size for better performance. To enable this feature:

For a Hugging Face model, set model.gradient_checkpointing_enable() or --gradient_checkpointing in the [Trainer].

For a non-Hugging Face model, use the DeepSpeed Activation Checkpointing API. You could also replace the Transformers modeling code and replace torch.utils.checkpoint with the DeepSpeed API. This approach is more flexible because you can offload the forward activations to the CPU memory instead of recalculating them

and made the change huggingface/transformers#30915.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
I just needed the accelerate job to run after making the change here huggingface/transformers#30915
Instead I got an error

Traceback (most recent call last):
  File "/workspace/cookbook-internal/recipes/tune/instruct_lora/finetune.py", line 223, in _app
    trainer = train(
  File "/workspace/cookbook-internal/recipes/tune/common/trainer.py", line 156, in train
    trainer.train(resume_from_checkpoint=False)
  File "/workspace/cookbook-internal/transformers/src/transformers/trainer.py", line 1885, in train
    return inner_training_loop(
  File "/workspace/cookbook-internal/transformers/src/transformers/trainer.py", line 2216, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/workspace/cookbook-internal/transformers/src/transformers/trainer.py", line 3238, in training_step
    loss = self.compute_loss(model, inputs)
  File "/workspace/cookbook-internal/transformers/src/transformers/trainer.py", line 3264, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1855, in forward
    loss = self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1083, in forward
    return self.base_model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/workspace/cookbook-internal/transformers/src/transformers/models/llama/modeling_llama.py", line 1164, in forward
    outputs = self.model(  
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/workspace/cookbook-internal/transformers/src/transformers/models/llama/modeling_llama.py", line 957, in forward
    layer_outputs = self._gradient_checkpointing_func(
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 995, in
 checkpoint
    CheckpointFunction.apply(function, all_outputs, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 566, in
 forward
    outputs = run_function(*inputs_cuda)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/workspace/cookbook-internal/transformers/src/transformers/models/llama/modeling_llama.py", line 713, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/workspace/cookbook-internal/transformers/src/transformers/models/llama/modeling_llama.py", line 414, in forward
    bsz, q_len, _ = hidden_states.size()
ValueError: not enough values to unpack (expected 3, got 2)

ds_report output
This is my accelerate setup

- `Accelerate` version: 0.30.1
- Platform: Linux-6.5.0-1018-gcp-x86_64-with-glibc2.35
- `accelerate` bash location: /usr/local/bin/accelerate
- Python version: 3.10.12  
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 1338.60 GB   
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: DEEPSPEED
        - use_cpu: False   
        - debug: True
        - num_processes: 8 
        - machine_rank: 0  
        - num_machines: 1  
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: False
        - deepspeed_config: {'deepspeed_config_file': 'zero_stage3_config.json', 'zero3_init_flag': True}
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

and this is the json used

{
     "zero_optimization": {
       "stage": 3,
       "stage3_gather_16bit_weights_on_model_save": true,
       "offload_optimizer": {
         "device": "none"
       },
       "offload_param": {
         "device": "none"
       }
     },
     "activation_checkpointing": {
       "partition_activations": true,
       "contiguous_memory_optimization": true,
       "cpu_checkpointing": true
     },
     "gradient_accumulation_steps": "auto",
     "train_batch_size": "auto",
     "train_micro_batch_size_per_gpu": "auto" 
 }

Screenshots
If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

- `Accelerate` version: 0.30.1
- Platform: Linux-6.5.0-1018-gcp-x86_64-with-glibc2.35
- `accelerate` bash location: /usr/local/bin/accelerate
- Python version: 3.10.12  
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 1338.60 GB   
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
       - compute_environment: LOCAL_MACHINE
       - distributed_type: DEEPSPEED
       - use_cpu: False   
       - debug: True
       - num_processes: 8 
       - machine_rank: 0  
       - num_machines: 1  
       - rdzv_backend: static
       - same_network: True
       - main_training_function: main
       - enable_cpu_affinity: False
       - deepspeed_config: {'deepspeed_config_file': 'zero_stage3_config.json', 'zero3_init_flag': True}
       - downcast_bf16: no
       - tpu_use_cluster: False
       - tpu_use_sudo: False
       - tpu_env: []

Launcher context
Are you launching your experiment with the deepspeed launcher, MPI, or something else?
I am using accelerate from hugging face. It works when I used the pytorch.utils.checkpoint but after this change huggingface/transformers#30915 to replace it by deepspeed.runtime.activation_checkpointing.checkpointing import checkpoint, it does not work
Docker context
Are you using a specific docker image that you can share?
It will be difficult given that I work for a company
Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

vkaul11 added bug Something isn't working training labels May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cannot replace pytorch.checkpoint with deepspeed.runtime.activation_checkpointing.checkpointing in accelerate #5550

[BUG] Cannot replace pytorch.checkpoint with deepspeed.runtime.activation_checkpointing.checkpointing in accelerate #5550

vkaul11 commented May 20, 2024

[BUG] Cannot replace pytorch.checkpoint with deepspeed.runtime.activation_checkpointing.checkpointing in accelerate #5550

[BUG] Cannot replace pytorch.checkpoint with deepspeed.runtime.activation_checkpointing.checkpointing in accelerate #5550

Comments

vkaul11 commented May 20, 2024