Adding extra layer in the BEiT architecture #157

EmoX777 · 2024-03-05T21:50:40Z

EmoX777
Mar 5, 2024

Hi Leondgarse,

I am trying to add a Patch Merger layer from https://arxiv.org/pdf/2202.12015.pdf in the middle of the BEiT model, after layer 5. However, the shape of the layer Patch Merger layer wasn't compatible with the received input in the attention block... Any ideas or suggestions?

Here is the code for the Patch Merger layer I used:

class PatchMerger(Layer):
def init(self, dim, num_tokens_out):
super(PatchMerger, self).init()
self.scale = dim ** -0.5
self.norm = nn.LayerNormalization()
self.queries = tf.Variable(tf.random.normal([num_tokens_out, dim]))
def call(self, x, training=True):
x = self.norm(x)
sim = tf.matmul(self.queries, tf.transpose(x, perm=[0, 2, 1]) * self.scale)
attn = tf.nn.softmax(sim, axis=-1)
x = tf.matmul(attn, x)
return x

Answered by leondgarse

Mar 9, 2024

Ya, for model with positional embeddings like BeitV2BasePatch16, it cann't be set patch_merging_num_tokens=8 with square input_shape. As you can see in the printing info After patch merging: blocks with cls token: 9, attn_height: 3, means:

Will result attention_blocks = 9 - 1 = 8, and it's not divisible by attn_height=3, and then attn_width = int(8 / 3) = 2.
Thus the built positional_embedding will have attn_height * attn_width + 1 = 7 tokens, not matching with 9.

It works with other values like:

from keras_cv_attention_models import beit
mm = beit.BeitV2BasePatch16(patch_merging_block_id=5, patch_merging_num_tokens=9)
# >>>> After patch merging: blocks: 10, attn_height: 3
mm = beit.Bei…

View full answer

leondgarse · 2024-03-07T05:57:55Z

leondgarse
Mar 7, 2024
Maintainer

Updated add patch_merging for beit. This should also consider the class token and positional embedding usage. Notice patch_merging_num_tokens should better be a square number if applying on models with positional embeddings.

from keras_cv_attention_models import beit
mm = beit.ViTBasePatch16(patch_merging_block_id=5)
# >>>> Before patch merging: blocks with cls token: 145, attn_height: 12
# >>>> After patch merging: blocks with cls token: 9, attn_height: 3

mm = beit.BeitV2BasePatch16(patch_merging_block_id=5, patch_merging_num_tokens=16, pretrained=None)
# >>>> Before patch merging: blocks with cls token: 197, attn_height: 14
# >>>> After patch merging: blocks with cls token: 17, attn_height: 4

3 replies

EmoX777 Mar 9, 2024
Author

Thank you @leondgarse for your quick response.

When I ran your modified code I got the following error in the "MultiHeadRelativePositionalEmbedding" layer. Knowing that I have initialized patch_merging_block_id=5 and patch_merging_num_tokens=8

leondgarse Mar 9, 2024
Maintainer

Ya, for model with positional embeddings like BeitV2BasePatch16, it cann't be set patch_merging_num_tokens=8 with square input_shape. As you can see in the printing info After patch merging: blocks with cls token: 9, attn_height: 3, means:

Will result attention_blocks = 9 - 1 = 8, and it's not divisible by attn_height=3, and then attn_width = int(8 / 3) = 2.
Thus the built positional_embedding will have attn_height * attn_width + 1 = 7 tokens, not matching with 9.

It works with other values like:

from keras_cv_attention_models import beit
mm = beit.BeitV2BasePatch16(patch_merging_block_id=5, patch_merging_num_tokens=9)
# >>>> After patch merging: blocks: 10, attn_height: 3
mm = beit.BeitV2BasePatch16(patch_merging_block_id=5, patch_merging_num_tokens=6)
# >>>> After patch merging: blocks: 7, attn_height: 3
mm = beit.BeitV2BasePatch16(patch_merging_block_id=5, patch_merging_num_tokens=8, input_shape=[112, 224, 3])
# >>>> After patch merging: blocks: 9, attn_height: 2

Just keep it divisible.

Answer selected by EmoX777

EmoX777 Mar 9, 2024
Author

Sounds good. Thank You.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding extra layer in the BEiT architecture #157

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Adding extra layer in the BEiT architecture #157

EmoX777 Mar 5, 2024

Replies: 1 comment · 3 replies

leondgarse Mar 7, 2024 Maintainer

EmoX777 Mar 9, 2024 Author

leondgarse Mar 9, 2024 Maintainer

EmoX777 Mar 9, 2024 Author

EmoX777
Mar 5, 2024

Replies: 1 comment 3 replies

leondgarse
Mar 7, 2024
Maintainer

EmoX777 Mar 9, 2024
Author

leondgarse Mar 9, 2024
Maintainer

EmoX777 Mar 9, 2024
Author