You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Repeatably the first time I call /inference after the server has been started performance is great, constant 100% CPU. Subsequent calls take about twice as long, CPU activity fluctuates from about 20% to 80%.
Something to do with reusing the context?
<start server with ./build/bin/whisper-server -m models/ggml-small.en.bin -t 16>
time curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./samples/output.wav
real 0m26.195s
time curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./samples/output.wav
real 0m48.280s
time curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./samples/output.wav
real 0m48.256s
<restart server with ./build/bin/whisper-server -m models/ggml-small.en.bin -t 16>
time curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./samples/output.wav
real 0m26.566s
time curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./samples/output.wav
real 0m48.180s
time curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./samples/output.wav
real 0m48.206s
Each invocation I see the exact same server output:
Repeatably the first time I call /inference after the server has been started performance is great, constant 100% CPU. Subsequent calls take about twice as long, CPU activity fluctuates from about 20% to 80%.
Something to do with reusing the context?
Each invocation I see the exact same server output:
In case it is relevant this is running on AWS ARM c7g.4xlarge, Ubuntu 24.04
The text was updated successfully, but these errors were encountered: