-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Qwen2 tp&pp model #161
base: main
Are you sure you want to change the base?
Conversation
collie/models/qwen2/model.py
Outdated
) | ||
|
||
self.num_heads_tp = query_states.shape[2] | ||
self.tp_size = self.num_heads // self.num_heads_tp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tp_size能通过self.config.tp_size得到
collie/models/qwen2/model.py
Outdated
|
||
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim) | ||
|
||
if attn_weights.size() != (bsz, self.num_heads_tp, q_len, kv_seq_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的assert也应该通过self.config.tp_size和self.num_heads来做
collie/models/qwen2/model.py
Outdated
rearrange(value_states, "b n (h d) -> b n h d", d=self.head_dim), | ||
) | ||
|
||
self.num_heads_tp = query_states.shape[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和qwen2attention里一样,通过config.tp_size得到
collie/models/qwen2/model.py
Outdated
"unexpected results may be encountered." | ||
) | ||
# self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx) | ||
self.self_attn = Qwen2FlashAttention2(config, layer_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
像这样写吧,否则config里的use_flash就不能控制这里的attn实现了
if config.attn_implementation == "flash_attention_2" or config.use_flash:
self.attention = InternLM2FlashAttention2(config=config)
else:
self.attention = InternLM2Attention(config=config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qwen2Attention也需要测试一下
collie/models/qwen2/model.py
Outdated
attention_mask: Optional[torch.Tensor] = None, | ||
position_ids: Optional[torch.LongTensor] = None, | ||
past_key_values: Optional[List[torch.FloatTensor]] = None, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个空行得删掉
self.self_attn = Qwen2FlashAttention2(config, layer_idx) | ||
# self.self_attn = Qwen2SdpaAttention(config, layer_idx) | ||
|
||
if config._attn_implementation == "flash_attention_2" or config.use_flash: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
842把_attn_implementation赋值成了"flash_attention_2",这里or是恒为True吗
测试test_generation.py pp_size=2和tp_size=2生成结果不一样。应该是kv cache的问题。 |
) | ||
from collie.models.utils import inputs_to_kv_cache_for_layer, kv_cache_to_inputs_for_layer, kv_cache_to_inputs_for_model, inputs_to_kv_cache_for_model | ||
|
||
if is_flash_attn_2_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果是2.0及以前版本的flahattn会是false,config.use_flash=True的时候会报错,可以优化一下报错信息。
我看到的报错是
File "/fs-computility/llm/shared/lvkai/workspace/collie/tests/models/qwen2/../../../collie/models/qwen2/model.py", line 488, in forward _flash_supports_window_size NameError: name '_flash_supports_window_size' is not defined
可以改成提示他flash attn版本最少2.1
No description provided.