Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] Modify FLOPs in MFU calculation for casual mask when using FlashAttention.
#831
opened May 17, 2024 by
Yuxin-CV
Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline
#830
opened May 17, 2024 by
Hongjie1Chu
[QUESTION] how to profile bubble time in pipeline parallelism?
#828
opened May 15, 2024 by
starstream
[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,
#825
opened May 14, 2024 by
starkhu
[QUESTION] Why is expert parallelism not supported during fp16 training?
#810
opened May 7, 2024 by
yutian-mt
[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?
#785
opened Apr 19, 2024 by
ezioliao
[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective
#780
opened Apr 16, 2024 by
ftgreat
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.