You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. We can see that in the output of the prompt same sentence is repeating. Why is this happening?
Prompt: "AI is going to"
On other prompt also same repetitions are happening.
2. When we directly run Lookahead (not run for lookahead=false earlier), we will get less throughput [51.3 tokens/sec], and in second run we are getting around [179 tokens/sec]. Can you explain why is this happening? Are you storing the result in cache or are you using the same Trie tree again?
Model: llama-2-7b fp16
file run from repo: llama_example.py
The text was updated successfully, but these errors were encountered:
1. We can see that in the output of the prompt same sentence is repeating. Why is this happening?
Prompt: "AI is going to"
On other prompt also same repetitions are happening.
2. When we directly run Lookahead (not run for lookahead=false earlier), we will get less throughput [51.3 tokens/sec], and in second run we are getting around [179 tokens/sec]. Can you explain why is this happening? Are you storing the result in cache or are you using the same Trie tree again?
Model: llama-2-7b fp16
file run from repo: llama_example.py
The text was updated successfully, but these errors were encountered: