Output of llma-2-7b model is repetative. #32

Vithulep · 2024-12-09T02:45:34Z

1. We can see that in the output of the prompt same sentence is repeating. Why is this happening?
Prompt: "AI is going to"
On other prompt also same repetitions are happening.

2. When we directly run Lookahead (not run for lookahead=false earlier), we will get less throughput [51.3 tokens/sec], and in second run we are getting around [179 tokens/sec]. Can you explain why is this happening? Are you storing the result in cache or are you using the same Trie tree again?

Model: llama-2-7b fp16
file run from repo: llama_example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output of llma-2-7b model is repetative. #32

Output of llma-2-7b model is repetative. #32

Vithulep commented Dec 9, 2024

Output of llma-2-7b model is repetative. #32

Output of llma-2-7b model is repetative. #32

Comments

Vithulep commented Dec 9, 2024