Transformer2D initializing #82

johnmullan · 2023-07-02T09:26:55Z

More of a question really, but do you know why the num_attention_heads and attention_head_dim are opposite when initialising Transformer2D blocks?

JCBrouwer · 2023-07-10T10:11:44Z

Diffusers defines it in terms of number of attention heads:

num_attention_heads,
out_channels // num_attention_heads,
in_channels=out_channels,

This repo uses number of channels per head:

in_channels // attn_num_head_channels,
attn_num_head_channels,
in_channels=in_channels,

Given that in_channels == out_channels these two are identical.

johnmullan closed this as completed Jul 2, 2023

johnmullan reopened this Jul 2, 2023

Provide feedback