Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM_RETRIEVER]Some questions about the papar of llm_retriever . #276

Open
zhouchang123 opened this issue Oct 22, 2024 · 2 comments
Open

Comments

@zhouchang123
Copy link

zhouchang123 commented Oct 22, 2024

Q1:The paper said BM25 Retriever is the initial model.Do you mean use the cross-encoder is used to tune BM25 retriever?

Q2:In Section 4.2 what's the function of s(x,y,xi,yi)?In Section 4.3 what's the function of Lcont?Is it same as Lreward in Section 4.2?

Q3:How to train the retriever?I don't understand the order of train pipline.Does the paper mean first use retriever to get the candidates ,then choose positive and negative candidates to train rewards model.After that,use Lcont and Ldistill to tune retriever?
In my opinion,it seems that first train the reward model ,then train retreiver?
@intfloat Can you explain to me ?

@zhouchang123 zhouchang123 changed the title Some questions about the papar of llm_retriever . [LLM_RETRIEVER]Some questions about the papar of llm_retriever . Oct 22, 2024
@intfloat
Copy link
Contributor

Hi @zhouchang123 ,

Q1: No, BM25 is unsupervised, it does not need any training.

Q2: s(x,y,xi,yi) is a real-valued score produced by the reward model (a BERT based encoder). Lcont is the InfoNCE contrastive loss from A Simple Framework for Contrastive Learning of Visual Representations.

Q3: It is an iterative process, we first use retriever to get candidates, then choose positive and negative to train reward model, after that reward model will be used to tune retriever again, and so on. At the start of training, we do not have any retriever or reward model, so we use the unsupervised BM25 as the initial retriever.

@zhouchang123
Copy link
Author

I know the initial retreiver is BM25.What's the meaning of initial BM25 retreiver don't need training and the reward model to tune retreiver?
What's is the tuning really do here?
Am I misunderstanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants