Replies: 1 comment
-
No need to rebuild the models. Note that if you use CoreML + Flash Attention, then only the Decoder will utilize the FA kernels. The Encoder would run as usual (i.e. without FA) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Do the models (
.bin
and.mlmodelc
for Metal/Core ML) have to be rebuilt to enable them with Flash Attention (--flash-attn
) per Whisper.cpp v1.6.0?If so, do both
.bin
and.mlmodelc
have to be rebuilt?I'm assuming that the answer can be inferred if one better understands the architecture and relationship between all these components, but I'm not versed or literate in that (my apologies).
Beta Was this translation helpful? Give feedback.
All reactions