New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightseq' Transformer expects an extra layer_norm on both encoder and decoder level #509

Open

yuting-wang-1000 opened this issue May 26, 2023 · 0 comments

yuting-wang-1000 commented May 26, 2023

Hi lightseq Team, I notice lightseq' transformer architecture has an extra layer_norm on both encoder and decoder level (outside decoder layers)

lightseq/lightseq/training/ops/pytorch/transformer.py

Line 103 in a7ab0da

self.layer_norm = nn.LayerNorm(embed_dim)

In Fairseq, this layer_norm is only added when per_layer_norm == True
https://github.com/facebookresearch/fairseq/blob/b30980349bcb2e870481d783ac8cb3f338361601/fairseq/models/transformer/transformer_encoder.py#L100

Due to the architectural difference, I m unable to export native Fairseq Transformer with post layer norm to protobuf/hdf5 format, using
https://github.com/bytedance/lightseq/blob/master/examples/inference/python/export/fairseq/native_fs_transformer_export.py. Cuz my model trained with Fairseq and per_layer_norm ==False doesn't have this extra layer_norm on decoder/encoder level.

Wonder why lightseq requires extra layer_norm on encoder/decoder level. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment