You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your excellent work!
I have two questions about your implementations.
1). It seems that I did not find any implementations about positional embedding in your code. Why the channel-aware attention do not need to specify the order of the input tokens?
2). I noticed that all your work are focusing on self-attention scheme. Is this scheme can be used for cross-attention scheme? For example, if I want to compute the similarity between two input vision features in pixel-wise (spatially). Can we apply the channel-wise attention (covariance) to achieve the similar similarity like the spatial-wise attention?
The text was updated successfully, but these errors were encountered:
Thanks for your excellent work!
I have two questions about your implementations.
1). It seems that I did not find any implementations about positional embedding in your code. Why the channel-aware attention do not need to specify the order of the input tokens?
2). I noticed that all your work are focusing on self-attention scheme. Is this scheme can be used for cross-attention scheme? For example, if I want to compute the similarity between two input vision features in pixel-wise (spatially). Can we apply the channel-wise attention (covariance) to achieve the similar similarity like the spatial-wise attention?
The text was updated successfully, but these errors were encountered: