Skip to content

Adding extra layer in the BEiT architecture #157

Answered by leondgarse
EmoX777 asked this question in Q&A
Discussion options

You must be logged in to vote

Ya, for model with positional embeddings like BeitV2BasePatch16, it cann't be set patch_merging_num_tokens=8 with square input_shape. As you can see in the printing info After patch merging: blocks with cls token: 9, attn_height: 3, means:

  • Will result attention_blocks = 9 - 1 = 8, and it's not divisible by attn_height=3, and then attn_width = int(8 / 3) = 2.
  • Thus the built positional_embedding will have attn_height * attn_width + 1 = 7 tokens, not matching with 9.

It works with other values like:

from keras_cv_attention_models import beit
mm = beit.BeitV2BasePatch16(patch_merging_block_id=5, patch_merging_num_tokens=9)
# >>>> After patch merging: blocks: 10, attn_height: 3
mm = beit.Bei…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@EmoX777
Comment options

@leondgarse
Comment options

Answer selected by EmoX777
@EmoX777
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants