MixFormer attention question (please...) #89

NJiHyeon · 2023-08-13T14:01:12Z

Hello.
Thank you for letting me know a really good model.
I am writing this because I have a question while studying while reading MixFormer code.
When dividing queries, keys, and values in Attention class, do we do it with torch.split(q, [t_h * t_w * 2, s_h * s_w], dim=2), not torch.split(q, [t_h * t_w, s_h * s_w], dim=2)??
I wonder what the exact meaning of multiplying the "template" by two is.
And I'm wondering if the code works even if it runs on torch.split(q, [t_h * t_w, s_h * s_w], dim=2).
Thank you.!!!

yutaocui · 2023-08-13T14:06:01Z

Hi, welcome to follow. The reason why we use torch.split(q, [t_h * t_w * 2, s_h * s_w], dim=2) is that two templates are employed for simulating the static template (i.e. the first given one) and online template during the training process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MixFormer attention question (please...) #89

MixFormer attention question (please...) #89

NJiHyeon commented Aug 13, 2023

yutaocui commented Aug 13, 2023

MixFormer attention question (please...) #89

MixFormer attention question (please...) #89

Comments

NJiHyeon commented Aug 13, 2023

yutaocui commented Aug 13, 2023