Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Viz of attention maps #10

Open
abhigoku10 opened this issue Jan 28, 2022 · 3 comments
Open

Viz of attention maps #10

abhigoku10 opened this issue Jan 28, 2022 · 3 comments

Comments

@abhigoku10
Copy link

abhigoku10 commented Jan 28, 2022

@yix081 @xwjabc thanks for you work, it has helped me a lot but had few queries

  1. Can we visualize the attention maps like gradcam / cam to see how the model is learning / learned? do you have a codebase to it or can you suggest how to do it ?
  2. Coat Lite has only serial block and Coat has serial+parallel blocks but the #params Coat Liter is higher than Coat is there any specific reason for this
  3. How to reduce the #params in the CoatLite/Coat <3M drop in accuracy is acceptable
    Thanks in advance
@xwjabc
Copy link
Contributor

xwjabc commented Jan 28, 2022

Hi @abhigoku10, thank you for your interest in our work!

  1. It is okay to do visualization on CoaT using CAM / GradCAM. However, if your aim is to visualize the attention map in CoaT, it might be a bit difficult: there is no explicit attention map in our attention mechanism since we compute the product of K and V first, thus you may not be able to extract the attention map directly. However, you can mimic the standard self-attention and manually compute the product of Q and K to generate the attention map.

  2. This is because we have different channel settings between CoaT and CoaT-Lite. We try to align the parameters of CoaT and CoaT-Lite for roughly head-to-head comparison, but there still could be some gap. You may find that in Tiny and Mini models, CoaT has slightly less parameters, but in Small models, CoaT-Lite has less parameters.

  3. I would suggest to reduce the channels in CoaT-Lite Tiny first. You can try to set a series of ratio t (e.g., t=0.3, 0.5, 0.7, 0.9). Then, multiply all channels by ratio t and train a model (perhaps on a subset of ImageNet if there are not enough computational resources). Draw the curve for validation accuracy of these models and analyze the accuracy drop w.r.t. parameters reduction. You may try to use other ways to reduce the parameters (e.g., reduce blocks or reduce channels in certain blocks) and compare the generated curves to obtain the best practice.

@abhigoku10
Copy link
Author

@xwjabc thanks for the response ,
3. can you let me knw in the code where i have to make the changes ? it would bbe helpful

@xwjabc
Copy link
Contributor

xwjabc commented Feb 12, 2022

You may try to modify the value of embed_dims in https://github.com/mlpc-ucsd/CoaT/blob/main/src/models/coat.py#L609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants