You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys,
we have tried your model for a bigger number of documents (then used in the example code) and found that the model does not suse the provided documents at all. How can I tell the model to only use the provied documents?. A general question: if the model has to use content from different documents how does it choose which ones to use (we couldn't find an internal similarity metric in your code that would do that)?
So our small example looks like this:
`
instruction = """ You are the Kitchen Owner's Manual. Based ONLY on the documents answer the Question."""
queries = ["How do I attach the kitchen cabinet?"]
with open("some_json.json", "r") as file:
documents = json.load(file)
def gritlm_instruction(instruction):
return "<|user|>\n" + instruction + "\n<|embed|>\n" if instruction else "<|embed|>\n"
model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
model.encode(documents, instruction=gritlm_instruction(""))
model.encode(queries, instruction=gritlm_instruction(instruction))
prompt_instr = "Question: How do I attach the kitchen cabinet?"
messages = [ {"role": "user", "content": prompt_instr}, "content": prompt_instr]
encoded = model.tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
encoded = encoded.to(model.device)
gen = model.generate(encoded, max_new_tokens=256, do_sample=False)
`
The answer is super generic and you can tell instantly that the model didn't use any of the provided information from the documents. So how can one fix that?
The text was updated successfully, but these errors were encountered:
Hey sorry that demo did not do any RAG but just showed how you can use the model to embed / generate i.e. when you do generate it is not using any of the stuff you encoded previously but just the text provided
Hi guys,
we have tried your model for a bigger number of documents (then used in the example code) and found that the model does not suse the provided documents at all. How can I tell the model to only use the provied documents?. A general question: if the model has to use content from different documents how does it choose which ones to use (we couldn't find an internal similarity metric in your code that would do that)?
So our small example looks like this:
`
instruction = """ You are the Kitchen Owner's Manual. Based ONLY on the documents answer the Question."""
`
The answer is super generic and you can tell instantly that the model didn't use any of the provided information from the documents. So how can one fix that?
The text was updated successfully, but these errors were encountered: