We provide weights for:
- ConvMixer-1536/20 (k = 9, p = 7)
- ConvMixer-768/32 (k = 7, p = 7)
- IMPORTANT: This model used ReLU instead of GELU.
- Currently, you would need to change
nn.GELU()
tonn.ReLU()
inconvmixer.py
to use these weights; we will fix this later.
- ConvMixer-1024/20 (k = 9, p = 14)