-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance on ARM (RBPi4) #974
Comments
Hi, good to hear ! The JACK client is really more of a proof of concept but if you want to upstream your improvement I'd be very open. Although it could also be a separate project entirely. There are probably many ways to improve performance on ARM, I had some plan to do it although real life is catching up a bit at the moment. There may be some low hanging fruits but none come to my mind now. The rest would be intricate work around the interpolation and rendering. For example, I had some improvements reordering and vectorizing the panning process by avoiding mixing up float and integer operations. ARM appears to be sensitive to this kind of stuff, but it's tricky work. To check if it's CPU or SD card bound, if you have a smallish library or a lot of ram, you can add If you compile for 32 bits with the proper vector instructions going to 64 is probably not going to be a massive change, but @jpcima knows more about ARM than me. |
It has the benefit of doubling the register amount, which has potential for speed improvement yes. As of now, there exist SSE parts of code which are not converted to SIMDE, so they lack the basic vectorization. The panning code has been a bottleneck once identified by @paulfd. @jofemodo are you able to do benchmarking on a variety of ARM machines? |
Hi @paulfd !
I deduced this for its simplicity, what is good for my goal of expanding it ;-)
Regards, |
I would be curious to hear of any further improvements possible for ARM machines. We plan to deploy sfizz on a variety of mobile (phone/tablet) devices, almost all of which are using ARM, either v7a or v8-a. From what we can see for our use case, almost all of the sfizz CPU time is spent in the interpolation functions. That may be due to our sample-heavy instruments though – for synthesised instruments this may look quite different. If there are low-hanging fruit updating the simde library to vectorise certain operations, this is something I would be interested at looking into. Admittedly, I don't have a huge amount of experience in the area though (I have a lot more experience using slightly higher-level SIMD primitives, e.g. in Swift). |
There may very well be low hanging fruits in the interpolation methods, even on SSE. On ARM you also have some cost, sometimes, when switching from float to integer to float operations unnecessarily. I think it is less true in ARMv8 though. The interpolation function themselves should be quite easy to benchmark. The challenge would be if there are slowdowns in the looping code which is quite complex (basically it's the part of the code that is responsible for finding the sample indices that are going to be interpolated all at once later on). |
Hi sfizz developers!
Here @jofemodo from zynthian project => https://zynthian.org/
I'm integrating sfizz on the zynthian stack using an improved version of the sfizz_jack client and it works quite nicely.
Congratulations! for your excellent work!!
Regarding performance, i must say i don't get the performance i would like. Playing hard, i get XRuns when the number of used voices grows over 45-50. Of course, this limit is lower when effects are added to the audio chain. I've been tweaking the settings (preload size, etc.) but i always get the XRuns before reaching 64 active voices, so we reduced the max. number of voices to 40, that is a good compromise while leaving some room for adding effects.
My questions:
Do you think there is some room for improving performance on ARM architecture? Some tips? ;-)
I'm using a good SD-card but ... do you think the bottleneck could be located on the disk-read subsystem? (it really smells more like a CPU usage issue).
Currently we are compiling for 32 bits. Could we expect a noticeable improvement on performance by migrating to 64 bits?
All the best,
The text was updated successfully, but these errors were encountered: