You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Q1:The paper said BM25 Retriever is the initial model.Do you mean use the cross-encoder is used to tune BM25 retriever?
Q2:In Section 4.2 what's the function of s(x,y,xi,yi)?In Section 4.3 what's the function of Lcont?Is it same as Lreward in Section 4.2?
Q3:How to train the retriever?I don't understand the order of train pipline.Does the paper mean first use retriever to get the candidates ,then choose positive and negative candidates to train rewards model.After that,use Lcont and Ldistill to tune retriever?
In my opinion,it seems that first train the reward model ,then train retreiver? @intfloat Can you explain to me ?
The text was updated successfully, but these errors were encountered:
zhouchang123
changed the title
Some questions about the papar of llm_retriever .
[LLM_RETRIEVER]Some questions about the papar of llm_retriever .
Oct 22, 2024
Q3: It is an iterative process, we first use retriever to get candidates, then choose positive and negative to train reward model, after that reward model will be used to tune retriever again, and so on. At the start of training, we do not have any retriever or reward model, so we use the unsupervised BM25 as the initial retriever.
I know the initial retreiver is BM25.What's the meaning of initial BM25 retreiver don't need training and the reward model to tune retreiver?
What's is the tuning really do here?
Am I misunderstanding?
Q1:The paper said BM25 Retriever is the initial model.Do you mean use the cross-encoder is used to tune BM25 retriever?
Q2:In Section 4.2 what's the function of s(x,y,xi,yi)?In Section 4.3 what's the function of Lcont?Is it same as Lreward in Section 4.2?
Q3:How to train the retriever?I don't understand the order of train pipline.Does the paper mean first use retriever to get the candidates ,then choose positive and negative candidates to train rewards model.After that,use Lcont and Ldistill to tune retriever?
In my opinion,it seems that first train the reward model ,then train retreiver?
@intfloat Can you explain to me ?
The text was updated successfully, but these errors were encountered: