Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(wrh): add RoPE for unizero #263

Closed
wants to merge 4 commits into from

Conversation

ruiheng123
Copy link
Contributor

We add RoPE for unizero

@puyuan1996 puyuan1996 added the enhancement New feature or request label Aug 13, 2024
@@ -55,6 +55,15 @@ def __init__(self, config: TransformerConfig) -> None:
self.blocks = nn.ModuleList([Block(config) for _ in range(config.num_layers)])
self.ln_f = nn.LayerNorm(config.embed_dim)

self.config.rope_theta = 500000
self.config.max_seq_len = 2048
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个参数确定一下,是否应该设置成与实际训练的长度一致

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rope_theta 是影响位置编码的频率,应该就用默认的就行。max_seq_len 是最大序列长度,它决定了预计算频率张量的长度,如果我们希望在测试时支持更长的序列,应该将 max_seq_len 设置为能覆盖我们期望的最大测试序列长度,例如如果我们测试最长是2048,这个值应该设置为2048, 10有点太小了,如果测试长度>10会报错。

max_env_step = int(5e5)
reanalyze_ratio = 0.
batch_size = 2
num_unroll_steps = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

训练的时候,不是用的debug config吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是,只是提交上来的是debug版的, 但Training的过程里用的不是

@PaParaZz1
Copy link
Member

This PR will be updated in #266.

@PaParaZz1 PaParaZz1 closed this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants