NVIDIA / Megatron-LM Public

Notifications
Fork 2k
Star 8.8k

Code
Issues 299
Pull requests 126
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

299 Open 228 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[BUG] Modify FLOPs in MFU calculation for casual mask when using FlashAttention.

#831 opened May 17, 2024 by Yuxin-CV

Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline

#830 opened May 17, 2024 by Hongjie1Chu

[QUESTION] Why not use tensor parallel APIs of pytorch

#829 opened May 16, 2024 by GuWei007

[QUESTION] how to profile bubble time in pipeline parallelism?

#828 opened May 15, 2024 by starstream

[BUG]

#827 opened May 14, 2024 by chrisgao7

[BUG] The argument --no-position-embedding should be fixed

#826 opened May 14, 2024 by Hoonly

[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,

#825 opened May 14, 2024 by starkhu

Does Megatron has plan to support llama pre-train？

#824 opened May 13, 2024 by wen020

Projeto liliti stk 3.6.9 inteligência artificial 🤖

#822 opened May 11, 2024 by felipeliliti

Projeto liliti stk 3.6.9 inteligência artificial

#821 opened May 11, 2024 by felipeliliti

Megatron-LM for LLaMa3

#818 opened May 10, 2024 by SDsly

[QUESTION] How does tensor_parallel coop with q/k_layernorm

#816 opened May 10, 2024 by cryoco

[BUG] Typo in drop_policy options in moe_utils.py

#815 opened May 9, 2024 by Malikeh97

[BUG] [MoE] Typo in Token Drop policy's default value

#812 opened May 7, 2024 by passaglia

[QUESTION] Why is expert parallelism not supported during fp16 training?

#810 opened May 7, 2024 by yutian-mt

[core dataset compilation error]

#807 opened May 6, 2024 by shamanez

[QUESTION] Does Megatron-Core supports LLAMA models?

#803 opened May 3, 2024 by noob-ctrl

[QUESTION] bf16 Parameters and fp32 Gradients

#800 opened Apr 30, 2024 by pluiez

[QUESTION] How to pre-build the dataset's index ?

#795 opened Apr 24, 2024 by etiennemlb

[BUG] Example of pretraining BERT does not work

#791 opened Apr 23, 2024 by xju2

When can we have a the MOE checkpoint convert script.

#790 opened Apr 22, 2024 by shamanez

[QUESTION] Validation loss & PPL keep going up

#787 opened Apr 20, 2024 by zhentingqi

[QUESTION] Is it expected to do grad norm on dense-optimizer and moe-optimizer respectively?

#785 opened Apr 19, 2024 by ezioliao

[QUESTION] RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=2, worker_count=1, timeout=0:10:00)

#782 opened Apr 16, 2024 by JanryPei

[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective

#780 opened Apr 16, 2024 by ftgreat

Previous 1 2 3 4 5 … 11 12 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly