Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get LLM model performance? #412

Open
KYUNGSOO-LEE opened this issue Jun 16, 2024 · 1 comment
Open

How to get LLM model performance? #412

KYUNGSOO-LEE opened this issue Jun 16, 2024 · 1 comment

Comments

@KYUNGSOO-LEE
Copy link

KYUNGSOO-LEE commented Jun 16, 2024

Hi

I would like to get the performance of Gemma model on-device(android) with medoapipe.

I read blog about llm model with mediapipe.
(https://developers.googleblog.com/en/large-language-models-on-device-with-mediapipe-and-tensorflow-lite/)

How to get LLM model performance(e.g. TTFT TPOT)?

I installed LLM inference example. But I can not find any logs about performance.

@AkulRT
Copy link

AkulRT commented Jul 24, 2024

I've been trying to look for the same thing. Would love to see something from the devs regarding being able to find prefill token speed and decode token speed by ourselves.

@KYUNGSOO-LEE as a crude substitute in the meantime, I am using .sizeInTokens() to find the input prompt token size and divide that by time for inference. I am calculating inference time using timeSource.markNow() before and after .generateResponse(). Maybe this can be a rough metric for you too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants