Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware and data requirements for model training. #115

Open
siyuzhu-fudan opened this issue May 11, 2024 · 3 comments
Open

Hardware and data requirements for model training. #115

siyuzhu-fudan opened this issue May 11, 2024 · 3 comments

Comments

@siyuzhu-fudan
Copy link
Member

What are the GPU requirements to run the training and approximately how many input videos should be used for training?

@pearbender
Copy link
Contributor

I would also like to know. I attempted training with an RTX 3060 and batch size of 1 and ran out of memory. I have 12 GB of dedicated GPU memory and 16 GB of shared GPU memory.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 114.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 24.51 GiB is allocated 
by PyTorch, and 766.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@pearbender
Copy link
Contributor

I have stage 1 training running with my RTX 3060 and 64 GB of RAM. That's 12 GB of GPU memory and 32 GB of shared GPU memory totaling 44 GB of GPU memory. It seems to be just barely enough.

@pearbender
Copy link
Contributor

pearbender commented May 17, 2024

I've given up on training after reading the paper which said training was conducted with 8 NVIDIA A100 GPUs. My training time for stage 1 is approximately 1800 hours.

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants