Question about qk_matmul's load_act calculation in prefill stage. #19

zzhbrr · 2024-12-11T09:09:41Z

Hi!

I think something is wrong when calculating load_act of qk_matmul in prefill stage.

From my understanding, the load_act of qk_matmul should be calculated as: load_act=seqlen * head_size * batchsize * num_attention_heads * a_byte.However, in the code at model_analyzer.py#L359, it is written as: load_act=seqlen * head_size * batchsize * num_key_value_heads * a_byte.

Could it be that I’m misunderstanding some fundamental concepts, or is there a potential issue with the code?

Thanks!

The text was updated successfully, but these errors were encountered:

yyjpro · 2024-12-16T11:26:51Z

I think load_act=seqlen * head_size * batchsize * num_key_value_heads * a_byte is correct, If num_key_value_heads=num_attention_heads, the model will use Multi Head Attention (MHA), if num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. num_key_value_heads would be general usage.

zzhbrr · 2024-12-17T04:08:38Z

Hi @yyjpro.

In qk_matmul, we need to load Q matrix and K matrix. Based on my understanding, the shape of Q matrix is [batchsize, num_attention_heads, seqlen, head_size], and the shape of K matrix is [batchsize, num_key_value_heads, seqlen, head_size]. Therefore, load_act=seqlen * head_size * batchsize * num_attention_heads * a_byte and load_kv_cache=seqlen * head_size * batchsize * num_kv_heads * kv_byte. Just as the formula in decode stage model_analyzer.py#L264

zzhbrr closed this as completed Dec 17, 2024

zzhbrr reopened this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about qk_matmul's load_act calculation in prefill stage. #19

Question about qk_matmul's load_act calculation in prefill stage. #19

zzhbrr commented Dec 11, 2024

yyjpro commented Dec 16, 2024

zzhbrr commented Dec 17, 2024

Question about qk_matmul's load_act calculation in prefill stage. #19

Question about qk_matmul's load_act calculation in prefill stage. #19

Comments

zzhbrr commented Dec 11, 2024

yyjpro commented Dec 16, 2024

zzhbrr commented Dec 17, 2024