Activity
if the hybrid module is an RNN, allow for folding it across the seque…
if the hybrid module is an RNN, allow for folding it across the seque…
flexibly handle hybrid module outputs
flexibly handle hybrid module outputs
Force push
1.44.1
1.44.1
readme
readme
add ability to hybridize attention with external module, for aiming t…
add ability to hybridize attention with external module, for aiming t…
able to return per token logit entropy
able to return per token logit entropy
merge hyper connection streams before final norm, to avoid edge case …
merge hyper connection streams before final norm, to avoid edge case …
address #305
address #305
using tanh in hyperconnection was not clear cut
using tanh in hyperconnection was not clear cut
note
note
Force push
note
note
add the proposed hyper-connections (multiple residual streams) propos…
add the proposed hyper-connections (multiple residual streams) propos…
default learned value residual mix to true
default learned value residual mix to true
Force push
default learned value residual mix to true
default learned value residual mix to true
address a bug
address a bug
one more test
one more test
address #303, allow for rotary embedding for cross attention if conte…
address #303, allow for rotary embedding for cross attention if conte…
readme
readme
oops learned value residual mix does not apply to cross attention
oops learned value residual mix does not apply to cross attention
Force push
oops learned value residual mix does not apply to cross attention
oops learned value residual mix does not apply to cross attention
deviate from paper and use softclamping instead for laser
deviate from paper and use softclamping instead for laser
it performs worse than baseline. maybe there is something else going …
it performs worse than baseline. maybe there is something else going …
safe log
safe log
1.42.21
1.42.21
Support custom positions for rotary positional embedding (#301)
Support custom positions for rotary positional embedding (#301)
Pull request merge
1.42.20
1.42.20
Custom pos alibi flash attn fix (#300)
Custom pos alibi flash attn fix (#300)
Pull request merge
seeing a small but noticeable improvement from proposed Laser attenti…
seeing a small but noticeable improvement from proposed Laser attenti…
address #296
address #296
make sure forgettable transformer works with memory key values
make sure forgettable transformer works with memory key values