Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different with gemma2 / gemma #236

Open
nigelzzzzzzz opened this issue Sep 18, 2024 · 8 comments
Open

Different with gemma2 / gemma #236

nigelzzzzzzz opened this issue Sep 18, 2024 · 8 comments

Comments

@nigelzzzzzzz
Copy link

Description of the bug:

hi @pkgoogle ,
i try to use convert_gemm2_to_tflite.py to treansfer model. i can see 2 error when i transfer.

  • this one i can change loader.py to solve it.
 File "/mnt/data/nigel_wang/ai-edge-torch-env/lib/python3.10/site-packages/ai_edge_torch/generative/utilities/loader.py", line 157, in load
  converted_state["tok_embedding.weight"] = state.pop(
KeyError: 'embedder.weight'
  • this one i can't solve it, because i am confuse why "attn_fused_qkv_proj="model.layers.{}.self_attn.qkv_proj" in gemma2.py.
TENSOR_NAMES = loading_utils.ModelLoader.TensorNames(
    ff_up_proj="model.layers.{}.mlp.up_proj",
    ff_down_proj="model.layers.{}.mlp.down_proj",
    ff_gate_proj="model.layers.{}.mlp.gate_proj",
    attn_fused_qkv_proj="model.layers.{}.self_attn.qkv_proj",
    attn_output_proj="model.layers.{}.self_attn.o_proj",
    pre_attn_norm="model.layers.{}.input_layernorm",
    post_attn_norm="model.layers.{}.post_attention_layernorm",
    pre_ff_norm="model.layers.{}.pre_feedforward_layernorm",
    post_ff_norm="model.layers.{}.post_feedforward_layernorm",
    embedding="embedder",
    final_norm="model.norm",
    lm_head=None,
)
 File "/mnt/data/nigel_wang/ai-edge-torch-env/lib/python3.10/site-packages/ai_edge_torch/generative/utilities/loader.py", line 302, in _map_attention
  converted_state[f"{prefix}.atten_func.qkv_projection.weight"] = state.pop(
KeyError: 'model.layers.0.self_attn.qkv_proj.weight'

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

@talumbau
Copy link
Contributor

talumbau commented Sep 18, 2024

Hi. Apologies for the confusion. In our docs here I notice that we give the correct link to the Kaggle PyTorch Gemma 2 2B (instruction-tuned) checkpoint, which is this link. However, the documentation mistakenly identifies this as just "Gemma" (which one would assume refers to just Gemma version 1). We will update the docs for additional clarity. From your error, I suspect you are using a Hugging Face checkpoint for Gemma 2 2B. It is a high priority to support HF checkpoints for Gemma 2, but it is not available yet. Can you try with the Kaggle PyTorch checkpoint for Gemma 2 2B and see if that resolves the issue?

@nigelzzzzzzz
Copy link
Author

hi @talumbau ,
thanks for your response!!!!

it can download two files. e.g., model.ckpt, tokenizer.model .

so i just change model.ckpt path in convert_gemma2_to_tflite.py ?

thanks you again!!

@talumbau
Copy link
Contributor

Yes, please download the model.ckpt file. The expected place to look for the file is:

os.path.join(pathlib.Path.home(), 'Downloads/llm_data/gemma2-2b')

@pkgoogle pkgoogle added the status:awaiting user response When awaiting user response label Sep 18, 2024
@a8nova
Copy link

a8nova commented Sep 22, 2024

For fixing error due to using Hugging Face checkpoint you can apply below diff:

diff --git a/ai_edge_torch/generative/examples/gemma/gemma2.py b/ai_edge_torch/generative/examples/gemma/gemma2.py
index b47c0d4..38e09b2 100644
--- a/ai_edge_torch/generative/examples/gemma/gemma2.py
+++ b/ai_edge_torch/generative/examples/gemma/gemma2.py
@@ -31,13 +31,16 @@ TENSOR_NAMES = loading_utils.ModelLoader.TensorNames(
     ff_up_proj="model.layers.{}.mlp.up_proj",
     ff_down_proj="model.layers.{}.mlp.down_proj",
     ff_gate_proj="model.layers.{}.mlp.gate_proj",
-    attn_fused_qkv_proj="model.layers.{}.self_attn.qkv_proj",
+    attn_query_proj="model.layers.{}.self_attn.q_proj",
+    attn_key_proj="model.layers.{}.self_attn.k_proj",
+    attn_value_proj="model.layers.{}.self_attn.v_proj",  
+    #attn_fused_qkv_proj="model.layers.{}.self_attn.qkv_proj",
     attn_output_proj="model.layers.{}.self_attn.o_proj",
     pre_attn_norm="model.layers.{}.input_layernorm",
     post_attn_norm="model.layers.{}.post_attention_layernorm",
     pre_ff_norm="model.layers.{}.pre_feedforward_layernorm",
     post_ff_norm="model.layers.{}.post_feedforward_layernorm",
-    embedding="embedder",
+    embedding="model.embed_tokens",
     final_norm="model.norm",
     lm_head=None,
 )

but conversion still runs out of memory on a 80GB system memory colab

@talumbau
Copy link
Contributor

Hi @a8nova thanks very much for the diff here. Can you prepare it as a PR? My understanding is that the quantizer code is landing a fix to reduce memory usage very soon. I think once that fix is in it will be a much better experience on colab

@a8nova
Copy link

a8nova commented Sep 25, 2024

Hi @talumbau! Sure, I can prepare a PR. How do you suggest we handle the attributes differences between HF and Kaggle checkpoints? Or do you just want me to apply above diff? I am worried that will break the conversion for people using kaggle checkpoints.

Copy link

github-actions bot commented Oct 3, 2024

Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs.

Copy link

This issue was closed because it has been inactive for 14 days. Please post a new issue if you need further assistance. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants