-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for splitting tensor arena into persistent/non-persistent arenas #908
Comments
I was able to test the persistent/non-persistent split feature and I think it can be very useful in some cases. I tested it with a model with the following arena sizes:
In this particular case, the entire arena would actually fit in SRAM, so this test is kind of pointless, but I wanted to see what kind of performance would we be able to achieve for a model where a split would actually make sense. Entire tensor arena in SRAM
Entire tensor arena in external RAM
Non-persistent arena in SRAM, persistent arena in external RAM
As you can see, the performance we see is virtually the same when the persistent arena is moved to external RAM, and we are able to shave off 9 kB of SRAM for this particular model. I wanted to test one of our larger models, but it turns out that the persistent arena size is actually not that big for that particular model, so the savings are unfortunately not enough to place the non-persistent arena in SRAM.
However, if we manage to shave off only 4 kB off the tensor arena, we could place the non-persistent arena in SRAM and we would see pretty significant wins for this particular model using the split arenas feature. |
Thank you @andresovela. One question, for this example, where you have mentioned arena in external RAM, are you using DDR, or have you simulated that case by running the model with just one thread? |
What do you mean by DDR? Is that a feature that needs to be enabled or something? This is all I did:
|
I am running |
When the tensor arena requirements of a given model are larger than the available SRAM, the tensor arena has to be placed in external RAM, leaving performance on the table due to being unable to use SRAM as scratch memory at all.
I created tensorflow/tflite-micro#2627 to ask TFLM to support this use case, but it doesn't seem to be happening any time soon. However, according to a collaborator, it's already possible to split the tensor arena into persistent/non-persistent arenas.
It seems that in order to support this use case, we would need to add this functionality to
xformer
. I can do it myself if I get some guidance.This would allow applications to place the non-persistent arena in SRAM and the persistent arena in external RAM, or vice versa, which would allow models to perform better on the xcore.ai platform.
The text was updated successfully, but these errors were encountered: