Skip to content

Activity

if the hybrid module is an RNN, allow for folding it across the seque…

lucidrainspushed 1 commit to main • c51ecd3…b28d82a • 
13 days ago

flexibly handle hybrid module outputs

Force push
lucidrainsforce pushed to main • f17b64a…c51ecd3 • 
13 days ago

1.44.1

lucidrainspushed 2 commits to main • b81646f…f17b64a • 
13 days ago

readme

lucidrainspushed 1 commit to main • 39bbb08…b81646f • 
13 days ago

add ability to hybridize attention with external module, for aiming t…

lucidrainspushed 1 commit to main • f944dd7…39bbb08 • 
15 days ago

able to return per token logit entropy

lucidrainspushed 1 commit to main • e1be411…f944dd7 • 
22 days ago

merge hyper connection streams before final norm, to avoid edge case …

lucidrainspushed 1 commit to main • 003275c…e1be411 • 
26 days ago

address #305

lucidrainspushed 1 commit to main • cdf51f7…003275c • 
27 days ago

using tanh in hyperconnection was not clear cut

lucidrainspushed 1 commit to main • 8880051…cdf51f7 • 
28 days ago

note

Force push
lucidrainsforce pushed to main • d27c3da…8880051 • 
28 days ago

note

lucidrainspushed 1 commit to main • 8b367e6…d27c3da • 
28 days ago

add the proposed hyper-connections (multiple residual streams) propos…

lucidrainspushed 1 commit to main • 0fd37f5…8b367e6 • 
28 days ago

default learned value residual mix to true

Force push
lucidrainsforce pushed to main • b25e7ac…0fd37f5 • 
29 days ago

default learned value residual mix to true

lucidrainspushed 1 commit to main • 5231e5d…b25e7ac • 
29 days ago

address a bug

lucidrainspushed 1 commit to main • f5d3907…5231e5d • 
on Dec 10, 2024

one more test

lucidrainspushed 1 commit to main • 70c59b6…f5d3907 • 
on Dec 10, 2024

address #303, allow for rotary embedding for cross attention if conte…

lucidrainspushed 1 commit to main • 66be236…70c59b6 • 
on Dec 10, 2024

readme

lucidrainspushed 1 commit to main • 90a3b11…66be236 • 
on Dec 7, 2024

oops learned value residual mix does not apply to cross attention

Force push
lucidrainsforce pushed to main • 3e92969…90a3b11 • 
on Dec 4, 2024

oops learned value residual mix does not apply to cross attention

lucidrainspushed 1 commit to main • 544c699…3e92969 • 
on Dec 4, 2024

deviate from paper and use softclamping instead for laser

lucidrainspushed 1 commit to main • 5b4ddef…544c699 • 
on Dec 3, 2024

it performs worse than baseline. maybe there is something else going …

lucidrainspushed 1 commit to main • 48afc51…5b4ddef • 
on Dec 2, 2024

safe log

lucidrainspushed 1 commit to main • 35452f4…48afc51 • 
on Dec 2, 2024

1.42.21

lucidrainspushed 1 commit to main • c493eb4…35452f4 • 
on Nov 30, 2024

Support custom positions for rotary positional embedding (#301)

Pull request merge
lucidrainspushed 1 commit to main • 8c993f0…c493eb4 • 
on Nov 30, 2024

1.42.20

lucidrainspushed 1 commit to main • 4c3e62a…8c993f0 • 
on Nov 29, 2024

Custom pos alibi flash attn fix (#300)

Pull request merge
lucidrainspushed 1 commit to main • 57efd77…4c3e62a • 
on Nov 29, 2024

seeing a small but noticeable improvement from proposed Laser attenti…

lucidrainspushed 1 commit to main • b720245…57efd77 • 
on Nov 29, 2024

address #296

lucidrainspushed 1 commit to main • 5791a85…b720245 • 
on Nov 26, 2024

make sure forgettable transformer works with memory key values

lucidrainspushed 1 commit to main • 28b693d…5791a85 • 
on Nov 24, 2024