How to get LLM model performance? #412

KYUNGSOO-LEE · 2024-06-16T00:56:12Z

Hi

I would like to get the performance of Gemma model on-device(android) with medoapipe.

I read blog about llm model with mediapipe.
(https://developers.googleblog.com/en/large-language-models-on-device-with-mediapipe-and-tensorflow-lite/)

How to get LLM model performance(e.g. TTFT TPOT)?

I installed LLM inference example. But I can not find any logs about performance.

AkulRT · 2024-07-24T12:52:49Z

I've been trying to look for the same thing. Would love to see something from the devs regarding being able to find prefill token speed and decode token speed by ourselves.

@KYUNGSOO-LEE as a crude substitute in the meantime, I am using .sizeInTokens() to find the input prompt token size and divide that by time for inference. I am calculating inference time using timeSource.markNow() before and after .generateResponse(). Maybe this can be a rough metric for you too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get LLM model performance? #412

How to get LLM model performance? #412

KYUNGSOO-LEE commented Jun 16, 2024 •

edited

Loading

AkulRT commented Jul 24, 2024

How to get LLM model performance? #412

How to get LLM model performance? #412

Comments

KYUNGSOO-LEE commented Jun 16, 2024 • edited Loading

AkulRT commented Jul 24, 2024

KYUNGSOO-LEE commented Jun 16, 2024 •

edited

Loading