Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient Checkpointing #4

Open
xrsrke opened this issue Oct 25, 2023 · 1 comment
Open

Gradient Checkpointing #4

xrsrke opened this issue Oct 25, 2023 · 1 comment
Assignees

Comments

@xrsrke
Copy link
Owner

xrsrke commented Oct 25, 2023

  • Selectively recompute the forward pass of some operations in the backward pass to save memory.
  • Replace transformers's gradient checkpointing with pipegoose's gradient checkpointing.

APIs

import pipegoose.utils.checkpointing import Checkpointing

mlp = model.transformer.blocks[0].mlp
mlp = Checkpointing(mlp, parallel_context)

outputs = mlp(inputs)

Reading

@xrsrke xrsrke changed the title Distributed Checkpoint Gradient Checkpointing Oct 25, 2023
@xrsrke xrsrke added the help wanted Extra attention is needed label Oct 25, 2023
xrsrke pushed a commit that referenced this issue Nov 22, 2023
fix model partitioning and add more tests
@xrsrke xrsrke removed the help wanted Extra attention is needed label Nov 28, 2023
@Etelis
Copy link

Etelis commented Dec 5, 2023

I will do it.
!assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants