You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have seen that you reference to the ConvMAE pertained based method as MixViT-COnvMAE, but actually, looking at your implementation the backbone is much more similar to the MixCvT layout, with multiple patch embedding and blocks.
Am I missing something or could be?
Because I am trying to adapt PiMAE as you have done with the ConvMAE model, thank you!
Moreover, I have seen that during training, you are passing templates and search tokes to the same backbone multiple times, how the training procedure deal with it? Because I would like to enrich your model with some kind of notion about hand trajectory (when tracked object is handled or similar).
The text was updated successfully, but these errors were encountered:
In terms of the patch embeding style, the MixViT-ConvMAE is more like MixCvT, so you are ture.
For the second question, I don't know what you means, can you give detailed explanation.
Hello, I have seen that you reference to the ConvMAE pertained based method as MixViT-COnvMAE, but actually, looking at your implementation the backbone is much more similar to the MixCvT layout, with multiple patch embedding and blocks.
Am I missing something or could be?
Because I am trying to adapt PiMAE as you have done with the ConvMAE model, thank you!
Moreover, I have seen that during training, you are passing templates and search tokes to the same backbone multiple times, how the training procedure deal with it? Because I would like to enrich your model with some kind of notion about hand trajectory (when tracked object is handled or similar).
The text was updated successfully, but these errors were encountered: