Gradient Checkpointing #4

xrsrke · 2023-10-25T05:10:12Z

Selectively recompute the forward pass of some operations in the backward pass to save memory.
Replace transformers's gradient checkpointing with pipegoose's gradient checkpointing.

APIs

import pipegoose.utils.checkpointing import Checkpointing

mlp = model.transformer.blocks[0].mlp
mlp = Checkpointing(mlp, parallel_context)

outputs = mlp(inputs)

Reading

https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html
Reducing Activation Recomputation in Large Transformer Models [[link]](https://arxiv.org/abs/2205.05198)

The text was updated successfully, but these errors were encountered:

fix model partitioning and add more tests

Etelis · 2023-12-05T07:13:32Z

I will do it.
!assign

xrsrke changed the title ~~Distributed Checkpoint~~ Gradient Checkpointing Oct 25, 2023

xrsrke added the help wanted Extra attention is needed label Oct 25, 2023

xrsrke pushed a commit that referenced this issue Nov 22, 2023

Merge pull request #4 from abourramouss/fix-model-partitioning

37f112b

fix model partitioning and add more tests

xrsrke removed the help wanted Extra attention is needed label Nov 28, 2023

xrsrke assigned Etelis Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Checkpointing #4

Gradient Checkpointing #4

xrsrke commented Oct 25, 2023 •

edited

Etelis commented Dec 5, 2023 •

edited

Gradient Checkpointing #4

Gradient Checkpointing #4

Comments

xrsrke commented Oct 25, 2023 • edited

Etelis commented Dec 5, 2023 • edited

xrsrke commented Oct 25, 2023 •

edited

Etelis commented Dec 5, 2023 •

edited