-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mlp_act computation problem #16
Comments
I am still a little confused about the lm-head OPs computation
Overall, whether the amount of computation of lm-head should be:
|
Overall, the correct code should be: lm_head
|
for decode stage , o_nume=[1,d] |
Thank you for your thorough code review and detailed analysis. You've identified several important calculation adjustments that need to be made:
We'll update these calculation methods in the code. Let’s ensure that our calculations reflect the correct logic and dimensions as you've outlined. |
Dear author, I'm truly grateful that you could notice my question in time. Your professionalism and responsibility have deeply impressed me, and I sincerely hope that I can learn more from you in the future. Next, I plan to modify the code and submit a pull request for you to review. I'm looking forward to having the opportunity to cooperate with you and your team to make progress together. Thank you again! |
The amount of computation for mlp_act in the latest code is
I have the following questions:
The dimension of the input to the LLama activation layer is intermediate_size // tp_size
"gate_proj":[hidden_size, intermediate_size // tp_size],
"up_proj":[hidden_size,intermediate_size // tp_size],
The calculation for whether to activate should be intermediate_size // tp_size?
The SILU is calculated by the formula SILU(x)=x⋅sigmoid(x)
Overall, whether the amount of computation of mlp_act should be:
The text was updated successfully, but these errors were encountered: