-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example for splitting model into two models and running them in two tiles #910
Comments
No worries, more information helps us improve the tools. Yes, that is feasible. We have been primarily focused on flash-based workflows, but there are various options to achieve what you are looking for. What is the size of the model? Would it fit within two tiles? We could communicate directly via emails or so, as that might be easier to share more info regarding the models. My email is [email protected] . |
Nice, I'll contact you via email then :) |
This can be done by splitting the model yourself and compiling them separately with xmos-ai-tools. You would have to wire them up in the application source code. |
I agree it's a waste of resources, but it may allow a model to run faster than if you had to read the weights from flash. Realistically, I think we won't use this option, but it'd be good to have an example I guess :) |
First of all, sorry for spamming you with issues 😅 I'm just trying to optimize inference for performance on a model larger than 512 kB and I'm exploring all possible options at the moment.
I think one alternative would be to split the large model into two smaller models, run each one on separate tiles and have the output of the model on tile[0] be the input to the model on tile[1]. Is that technically feasible using channels or is that a no-go?
The text was updated successfully, but these errors were encountered: