Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Qwen2 tp&pp model #161

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

Anti-Entrophic
Copy link
Contributor

No description provided.

@KaiLv69 KaiLv69 self-requested a review April 13, 2024 12:42
)

self.num_heads_tp = query_states.shape[2]
self.tp_size = self.num_heads // self.num_heads_tp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tp_size能通过self.config.tp_size得到


attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)

if attn_weights.size() != (bsz, self.num_heads_tp, q_len, kv_seq_len):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的assert也应该通过self.config.tp_size和self.num_heads来做

rearrange(value_states, "b n (h d) -> b n h d", d=self.head_dim),
)

self.num_heads_tp = query_states.shape[2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和qwen2attention里一样,通过config.tp_size得到

"unexpected results may be encountered."
)
# self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
self.self_attn = Qwen2FlashAttention2(config, layer_idx)
Copy link
Collaborator

@KaiLv69 KaiLv69 Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

像这样写吧,否则config里的use_flash就不能控制这里的attn实现了
if config.attn_implementation == "flash_attention_2" or config.use_flash:
self.attention = InternLM2FlashAttention2(config=config)
else:
self.attention = InternLM2Attention(config=config)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen2Attention也需要测试一下

attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个空行得删掉

self.self_attn = Qwen2FlashAttention2(config, layer_idx)
# self.self_attn = Qwen2SdpaAttention(config, layer_idx)

if config._attn_implementation == "flash_attention_2" or config.use_flash:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

842把_attn_implementation赋值成了"flash_attention_2",这里or是恒为True吗

@KaiLv69
Copy link
Collaborator

KaiLv69 commented Apr 24, 2024

测试test_generation.py pp_size=2和tp_size=2生成结果不一样。应该是kv cache的问题。

)
from collie.models.utils import inputs_to_kv_cache_for_layer, kv_cache_to_inputs_for_layer, kv_cache_to_inputs_for_model, inputs_to_kv_cache_for_model

if is_flash_attn_2_available():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是2.0及以前版本的flahattn会是false,config.use_flash=True的时候会报错,可以优化一下报错信息。

我看到的报错是
File "/fs-computility/llm/shared/lvkai/workspace/collie/tests/models/qwen2/../../../collie/models/qwen2/model.py", line 488, in forward _flash_supports_window_size NameError: name '_flash_supports_window_size' is not defined
可以改成提示他flash attn版本最少2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants