adding-support-for-mamba2 #1009

Goekdeniz-Guelmez · 2024-10-02T10:49:20Z

No description provided.

…doesnt axepts groups parameter

…niz-Guelmez/mlx-examples into adding-support-for-mamba2

…t MambaMixer block pass)

…niz-Guelmez/mlx-examples into adding-support-for-mamba2

hg0428 · 2024-10-22T18:51:21Z

Codestral Mamba and other models rely on the Mamba2 architecture. Hopefully we can get this soon.

awni · 2024-11-04T22:04:27Z

How is it going here? Still very slow?

Goekdeniz-Guelmez · 2024-11-05T06:43:56Z

How is it going here? Still very slow?

Unfortunately Yes, I did look into the transformers implementation and rewrote the slow working Mamba2Mixer class, I haven’t got time to continue working on it, but will continue in the weekend.

… is generateing gibberish

…t still only one input Token and outputs gibberish

…s still a litle slow: 0.222 tokens-per-sec

Goekdeniz-Guelmez · 2024-11-21T21:34:07Z

@awni I finally got it to work!

Inference:

python -m mlx_lm.generate --model rokyang/mamba2-130m-hf --prompt "hello" --max-tokens 22 --ignore-chat-templat
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 65948.18it/s]
==========
Prompt: hello
, I am a little girl, I am a little girl, I am a little girl, I am a
==========
Prompt: 1 tokens, 7.499 tokens-per-sec
Generation: 22 tokens, 28.258 tokens-per-sec
Peak memory: 0.454 GB

python -m mlx_lm.generate --model rokyang/mamba2-130m-hf --prompt "hello world" --max-tokens 22 --ignore-chat-templat
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 55043.36it/s]
==========
Prompt: hello world


hello world
hello world
hello world
hello world
hello world
hello world
hello world
==========
Prompt: 2 tokens, 5.552 tokens-per-sec
Generation: 22 tokens, 24.904 tokens-per-sec
Peak memory: 0.454 GB

Training

python -m mlx_lm.lora \                                                           (adding-support-for-mamba2|-1)
    --model rokyang/mamba2-130m-hf \
    --train \
    --data /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/data_tyni \
    --iters 5 \
    --batch-size 1 \
    --num-layers 1 \
    --val-batches 1 \
    --steps-per-report 1 \
    --adapter-path /Users/gokdenizgulmez/Desktop/mamba2-pretrain \
    --max-seq-length 12
Loading pretrained model
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 87381.33it/s]
Loading datasets
Training
Trainable parameters: 0.956% (1.233M/128.988M)
Starting training..., iters: 5
[WARNING] Some sequences are longer than 12 tokens. The longest sentence 1508 will be truncated to 12. Consider pre-splitting your data to save memory.
[WARNING] Some sequences are longer than 12 tokens. The longest sentence 1250 will be truncated to 12. Consider pre-splitting your data to save memory.
Iter 1: Val loss 7.408, Val took 1.578s
Iter 1: Train loss 7.408, Learning Rate 1.000e-05, It/sec 0.405, Tokens/sec 4.450, Trained Tokens 11, Peak mem 2.173 GB
[WARNING] Some sequences are longer than 12 tokens. The longest sentence 1692 will be truncated to 12. Consider pre-splitting your data to save memory.
Iter 2: Train loss 7.275, Learning Rate 1.000e-05, It/sec 2.110, Tokens/sec 23.212, Trained Tokens 22, Peak mem 2.189 GB
[WARNING] Some sequences are longer than 12 tokens. The longest sentence 1397 will be truncated to 12. Consider pre-splitting your data to save memory.
Iter 3: Train loss 7.093, Learning Rate 1.000e-05, It/sec 2.694, Tokens/sec 29.637, Trained Tokens 33, Peak mem 2.189 GB
[WARNING] Some sequences are longer than 12 tokens. The longest sentence 1238 will be truncated to 12. Consider pre-splitting your data to save memory.
Iter 4: Train loss 6.880, Learning Rate 1.000e-05, It/sec 2.803, Tokens/sec 30.829, Trained Tokens 44, Peak mem 2.189 GB
[WARNING] Some sequences are longer than 12 tokens. The longest sentence 1265 will be truncated to 12. Consider pre-splitting your data to save memory.
[WARNING] Some sequences are longer than 12 tokens. The longest sentence 802 will be truncated to 12. Consider pre-splitting your data to save memory.
Iter 5: Val loss 6.641, Val took 0.175s
Iter 5: Train loss 6.641, Learning Rate 1.000e-05, It/sec 2.754, Tokens/sec 30.298, Trained Tokens 55, Peak mem 2.189 GB
Saved final weights to /Users/gokdenizgulmez/Desktop/mamba2-pretrain/adapters.safetensors.

awni · 2024-11-23T19:46:43Z

Very nice!! What's a good model to test with? The one you are using doesn't look like it generates high-quality responses.

hg0428 · 2024-11-23T21:16:45Z

Very nice!! What's a good model to test with? The one you are using doesn't look like it generates high-quality responses.

Mamba Codestral or one of the larger base Mamba2 models.

awni · 2024-11-23T21:27:58Z

I tried running codestral and it crashed with a weight size mismatch error:

ValueError: Expected shape (16768, 4096) but received shape (18560, 4096) for parameter backbone.layers.0.mixer.in_proj.weight

Looks like the weight shape is not computed correctly for that model?

This is what I ran for reference:

mlx_lm.generate --model mistralai/Mamba-Codestral-7B-v0.1 --prompt "Write a quick sort in c++" -m 128

Goekdeniz-Guelmez · 2024-11-24T14:22:02Z

Ahh ok, yea I didn't try Codestral, the model I used is the safetensor convert from the OG states-space account called rokyang/mamba2-130m-hf, I'll look into the Codestral shape problem later this day.

…ns from mamba2.py

Create mamba2.py

49b9fc1

Goekdeniz-Guelmez changed the title ~~Create mamba2.py~~ adding-support-for-mamba2 Oct 2, 2024

Goekdeniz-Guelmez mentioned this pull request Oct 2, 2024

support for mamba 2 (Codestral mamba) #859 #893

Open

Goekdeniz-Guelmez and others added 3 commits October 2, 2024 18:21

updating ACKNOWLEDGMENTS.md file

409ddc4

update trainer/lora.py and adding DepthWiseConv1d because mlx 0.18.0 …

264ba43

…doesnt axepts groups parameter

Merge branch 'ml-explore:main' into adding-support-for-mamba2

52d6ca0

awni mentioned this pull request Oct 10, 2024

Architecture Requests for Mamba #1030

Open

Goekdeniz-Guelmez and others added 16 commits October 11, 2024 20:53

fixing loading the model

4e1236c

Merge branch 'adding-support-for-mamba2' of https://github.com/Goekde…

9c075a7

…niz-Guelmez/mlx-examples into adding-support-for-mamba2

quick clean up and fix

6f88dd5

adding debug statements

00ba27f

Merge branch 'ml-explore:main' into adding-support-for-mamba2

3f1c1dd

Merge branch 'ml-explore:main' into adding-support-for-mamba2

855fcc4

adding debug statements (somehiw generating only goes through the fis…

8073cb4

…t MambaMixer block pass)

Merge branch 'adding-support-for-mamba2' of https://github.com/Goekde…

181d6ab

…niz-Guelmez/mlx-examples into adding-support-for-mamba2

fix generation works too (almost)

cd036cc

quick save

4ab5139

generation works but outputs gibberish

ab4cf1d

still generating gibberish

c1634ce

Merge branch 'ml-explore:main' into adding-support-for-mamba2

0ef73f3

generation works! trying training now

b9c57cd

Merge branch 'adding-support-for-mamba2' of https://github.com/Goekde…

5326d93

…niz-Guelmez/mlx-examples into adding-support-for-mamba2

adding multi token input and correct cache handling in ssm step

758597e

Goekdeniz-Guelmez and others added 6 commits October 22, 2024 21:23

update

55485b9

not working, incorrect handling with cache probably

e43a2ab

notes

9ab581d

inference works but is hella slow

a677638

update

7c8849e

Merge branch 'ml-explore:main' into adding-support-for-mamba2

3b70708

updates

58b448d

Goekdeniz-Guelmez and others added 12 commits November 6, 2024 16:35

save push

906f972

save checkpoint

800b602

fixed inference slowness but it cant handle multible Token inputs and…

3a499f9

… is generateing gibberish

Merge branch 'ml-explore:main' into adding-support-for-mamba2

49d3f18

removed the custom Mamba2Cache adn updated the existing MambaCache bu…

2f95b36

…t still only one input Token and outputs gibberish

imopemented multi Token inputs, but still generating Gibberish

1a66883

nits

1d85106

Merge branch 'ml-explore:main' into adding-support-for-mamba2

e4eae97

Fixed streaming generation and got rid of generating gibberish, but i…

e22b2db

…s still a litle slow: 0.222 tokens-per-sec

removing some files

117ffd3

inference fixed

57b1717

removing last checkpoint file

a6ddc27

Goekdeniz-Guelmez marked this pull request as ready for review November 21, 2024 21:34

Goekdeniz-Guelmez and others added 9 commits November 24, 2024 16:26

loading codestral works but no tinference

38e5801

Merge branch 'main' into adding-support-for-mamba2

ddad210

inference on codestral works but is giberish

9f8a6a3

nits

b10afe3

nits

80e88b4

clean up

184d3d3

Merge branch 'ml-explore:main' into adding-support-for-mamba2

c1d9ec3

optimizing the code for faster inference but still generates giberish

a883e39

adding the modelnames in the LORA.md file and removing unused functio…

dff4e52

…ns from mamba2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding-support-for-mamba2 #1009

adding-support-for-mamba2 #1009

Goekdeniz-Guelmez commented Oct 2, 2024

hg0428 commented Oct 22, 2024

awni commented Nov 4, 2024

Goekdeniz-Guelmez commented Nov 5, 2024

Goekdeniz-Guelmez commented Nov 21, 2024

awni commented Nov 23, 2024

hg0428 commented Nov 23, 2024

awni commented Nov 23, 2024

Goekdeniz-Guelmez commented Nov 24, 2024

adding-support-for-mamba2 #1009

Are you sure you want to change the base?

adding-support-for-mamba2 #1009

Conversation

Goekdeniz-Guelmez commented Oct 2, 2024

hg0428 commented Oct 22, 2024

awni commented Nov 4, 2024

Goekdeniz-Guelmez commented Nov 5, 2024

Goekdeniz-Guelmez commented Nov 21, 2024

@awni I finally got it to work!

Inference:

Training

awni commented Nov 23, 2024

hg0428 commented Nov 23, 2024

awni commented Nov 23, 2024

Goekdeniz-Guelmez commented Nov 24, 2024