Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about embedding_transform #38

Open
icerooqiu opened this issue Mar 8, 2024 · 2 comments
Open

Question about embedding_transform #38

icerooqiu opened this issue Mar 8, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@icerooqiu
Copy link

Hi

Thank you for presenting your research. I have a question regarding the embedding_transform in inversion.py. As per my understanding, this function corresponds to the MLP model described in your publication, responsible for transforming logits into pseudo-embeddings, or what is referred to as the 'zero-step' model. Could you elaborate on how this MLP model was trained to ensure it generates meaningful predictions? The paper and readme seem to lack detailed information on this aspect, and any additional insights you could provide would be greatly appreciated. Thank you.

@jxmorris12 jxmorris12 added the documentation Improvements or additions to documentation label Mar 12, 2024
@jxmorris12
Copy link
Owner

Hi @icerooqiu -- we train the whole model end-to-end to generate text conditioned on embeddings. So the MLP layer is updated via gradient descent to try to make the correct text more likely given the input embedding. Does that answer your question?

@icerooqiu
Copy link
Author

Hi @icerooqiu -- we train the whole model end-to-end to generate text conditioned on embeddings. So the MLP layer is updated via gradient descent to try to make the correct text more likely given the input embedding. Does that answer your question?

Thank you for replying, I have another questions about the model training. I have two more questions about the model,

  1. I am not fully understand about this code, logits = outputs.logits[torch.arange(B), attention_mask.sum(1) - 1] from function _process_embedder_output. I have tested this code and it returns the one logit only. So, assume there's one text only and input ids in length N and vocab size is M, the valid logits size is (1, N, M) and this code gives (1, M). My question is, do we use one logit only for the following prediction? What about the rest of them.
  2. Another question is about the Mock Embedding, when should we use it? I tried your instruction from readme python vec2text/run.py --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --max_seq_length 128 --num_train_epochs 100 --max_eval_samples 1000 --eval_steps 25000 --warmup_steps 100000 --learning_rate 0.0002 --dataset_name one_million_instructions --model_name_or_path t5-base --use_wandb=0 --embedder_model_name gpt2 --experiment inversion_from_logits_emb --bf16=1 --embedder_torch_dtype float16 --lr_scheduler_type constant_with_warmup --use_frozen_embeddings_as_input 1 --mock_embedder 0, But I got a out of memory issue when training, so I set mock_embedder to be true, then model works but training result is terrible. I am not quite sure if mock_embedder is the reason, I think all the embedder has been pre-computed and saved inside cache directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants