Energy Estimation Accuracy with Input Tokens #86
-
Thank you for the great work on this project! I noticed that energy estimation currently focuses on output tokens and request latency. However, input tokens and context length significantly affect the compute load, especially in large language models (LLMs). Is there a reason input tokens are not included in the energy estimation? Are there any plans to incorporate this aspect in future updates? Appreciate your time and insights! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey @RameezI, sorry for the late answer, and many thanks for your interest in this project! |
Beta Was this translation helpful? Give feedback.
Hey @RameezI, sorry for the late answer, and many thanks for your interest in this project!
Following how LLMs work, we make the assumption that the typical request involves much more decoding than encoding, and therefore that the energy consumption of the encoding is negligible.
It is however true that we don’t handle the edge cases such as giving a huge text to a model, and ask for a very precise answer that would involve only a few tokens (for example, give a book to a model and ask for the age of a particular character).
Could you send us the references that state that the input context length significantly affects the compute load? If it turns out that it has more impact than what we…