Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UPRISE]After rereading the paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation,I have some questions. #262

Open
zhouchang123 opened this issue Sep 10, 2024 · 13 comments

Comments

@zhouchang123
Copy link

image
1.How to get the scores through GPT-Neo-2.7B?
2.In which procedure,the prompt get positive or negative,after get the scores or after encode before score?

@cdxeve
Copy link
Contributor

cdxeve commented Sep 10, 2024

Q1: How to get the scores through GPT-Neo-2.7B?
By calculating the task metric score of each input concatenation of prompt + testing input, see Section 3.2.

Q2: In which procedure, the prompt get positive or negative, after get the scores or after encode before score?
After getting the scores. For all the scored prompts for a training example, we label the prompt with the highest score as positive. For negative samples, we randomly sample B training demonstrations from the prompt pool, in addition, we label B demonstrations corresponding to the lowest B scores in the sampled prompts as hard negatives, details are in Section 3.2.

@zhouchang123
Copy link
Author

zhouchang123 commented Sep 10, 2024

What about the score through prompt retriever?
Is the similarity of the two vectors after encoder?
Thanks very much.

@cdxeve
Copy link
Contributor

cdxeve commented Sep 10, 2024

You may refer to Section 3.4 to see how we get the score after tuning the prompt retriever.

@zhouchang123
Copy link
Author

Section 3.4 introduced the inference part?
It is the same in training pipline ?
image

@cdxeve
Copy link
Contributor

cdxeve commented Sep 11, 2024

Training is in Section 3.3, you may refer to the provided code as well.

@zhouchang123
Copy link
Author

image
Section 3.3 only introduce sim(x, p) ,do you mean sim(x, p) is the score ?
image

@cdxeve
Copy link
Contributor

cdxeve commented Sep 11, 2024

Yes, sim(x, p) is the score.

@zhouchang123
Copy link
Author

zhouchang123 commented Sep 11, 2024

In paper,the positive prompt number is 1 and negative prompt number is 20.But not demonstrate the total number of prompts in one train epoch .
What will happen if the prompts not positive or negative?
To the prompts not positive or negative,InfoNCE seems not include these prompts.

@cdxeve
Copy link
Contributor

cdxeve commented Sep 11, 2024

Yes, InfoNCE would not consider the prompts that are neither positive nor negative.

@zhouchang123
Copy link
Author

I found some confusion about the pipline of training and inferencing.
In training pipline, the input is include the task name and the query and the metric considerates the task.
However when inferencing,the input is only the query without task name.
So could add a module that according to the query to clarify its task name,and first filter the task name then retriever? @cdxeve

@cdxeve
Copy link
Contributor

cdxeve commented Oct 23, 2024

We do not input the task name during training, and the task name in the image is only for ease of understanding. You may refer to the formula in section 3.2 for details.

@zhouchang123
Copy link
Author

I viewed the file prompt_pool.json and each dict is annotated to different task name.So the task name is only to divide to its metric score?
The normal state of mind when retrieving is to retriever in the prompts of similar task rather than all the prompts.

@cdxeve
Copy link
Contributor

cdxeve commented Oct 24, 2024

Q1: Is the task name only used to divide it by metric score?
A1: We keep the task name in the metadata to support many potential uses, but we don’t include it as input during training.

Q2: The normal state of mind when retrieving is to retriever in the prompts of similar task rather than all the prompts.
A1: You could try this for a quick test, but I think the diversity will be too constrained since the number of tasks is much smaller than the number of demonstrations in the prompt pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants