You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello.
Thank you for letting me know a really good model.
I am writing this because I have a question while studying while reading MixFormer code.
When dividing queries, keys, and values in Attention class, do we do it with torch.split(q, [t_h * t_w * 2, s_h * s_w], dim=2), not torch.split(q, [t_h * t_w, s_h * s_w], dim=2)??
I wonder what the exact meaning of multiplying the "template" by two is.
And I'm wondering if the code works even if it runs on torch.split(q, [t_h * t_w, s_h * s_w], dim=2).
Thank you.!!!
The text was updated successfully, but these errors were encountered:
Hi, welcome to follow. The reason why we use torch.split(q, [t_h * t_w * 2, s_h * s_w], dim=2) is that two templates are employed for simulating the static template (i.e. the first given one) and online template during the training process.
Hello.
Thank you for letting me know a really good model.
I am writing this because I have a question while studying while reading MixFormer code.
When dividing queries, keys, and values in Attention class, do we do it with torch.split(q, [t_h * t_w * 2, s_h * s_w], dim=2), not torch.split(q, [t_h * t_w, s_h * s_w], dim=2)??
I wonder what the exact meaning of multiplying the "template" by two is.
And I'm wondering if the code works even if it runs on torch.split(q, [t_h * t_w, s_h * s_w], dim=2).
Thank you.!!!
The text was updated successfully, but these errors were encountered: