Performance on ARM (RBPi4) #974

jofemodo · 2021-08-20T10:33:20Z

Hi sfizz developers!

Here @jofemodo from zynthian project => https://zynthian.org/
I'm integrating sfizz on the zynthian stack using an improved version of the sfizz_jack client and it works quite nicely.
Congratulations! for your excellent work!!

Regarding performance, i must say i don't get the performance i would like. Playing hard, i get XRuns when the number of used voices grows over 45-50. Of course, this limit is lower when effects are added to the audio chain. I've been tweaking the settings (preload size, etc.) but i always get the XRuns before reaching 64 active voices, so we reduced the max. number of voices to 40, that is a good compromise while leaving some room for adding effects.

My questions:

Do you think there is some room for improving performance on ARM architecture? Some tips? ;-)
I'm using a good SD-card but ... do you think the bottleneck could be located on the disk-read subsystem? (it really smells more like a CPU usage issue).
Currently we are compiling for 32 bits. Could we expect a noticeable improvement on performance by migrating to 64 bits?

All the best,

paulfd · 2021-08-23T09:35:39Z

Hi, good to hear ! The JACK client is really more of a proof of concept but if you want to upstream your improvement I'd be very open. Although it could also be a separate project entirely.

There are probably many ways to improve performance on ARM, I had some plan to do it although real life is catching up a bit at the moment. There may be some low hanging fruits but none come to my mind now. The rest would be intricate work around the interpolation and rendering. For example, I had some improvements reordering and vectorizing the panning process by avoiding mixing up float and integer operations. ARM appears to be sensitive to this kind of stuff, but it's tricky work.

To check if it's CPU or SD card bound, if you have a smallish library or a lot of ram, you can add hint_ram_based=1 in a <control> block. This should load all samples in RAM and avoid reading from disk. Also within your JACK client you may use a linear interpolator instead of the default one, using this API or the equivalent C one.

If you compile for 32 bits with the proper vector instructions going to 64 is probably not going to be a massive change, but @jpcima knows more about ARM than me.

jpcima · 2021-08-23T16:19:41Z

Currently we are compiling for 32 bits. Could we expect a noticeable improvement on performance by migrating to 64 bits?

It has the benefit of doubling the register amount, which has potential for speed improvement yes.
This needs measurement.

As of now, there exist SSE parts of code which are not converted to SIMDE, so they lack the basic vectorization.

The panning code has been a bottleneck once identified by @paulfd.
Maybe the lerp vectorization trick already present in WindowedSinc can be applied to any tabulated functions, including pan, we've not benchmarked that one on ARM yet.
Alternatively the pan can be computed with the function sqrt, which is an instruction present in both Intels and ARMs. (but costly, once again I didn't run the comparison, and no idea how it's on the ARM)

@jofemodo are you able to do benchmarking on a variety of ARM machines?

jofemodo · 2021-08-24T14:29:18Z

@jofemodo are you able to do benchmarking on a variety of ARM machines?

Hi @jpcima,
I only develop and test on RBPi. I could do benchmarking for this only.

Regards,

jofemodo · 2021-08-24T14:34:06Z

Hi @paulfd !

The JACK client is really more of a proof of concept but if you want to upstream your improvement I'd be very open.Although it could also be a separate project entirely.

I deduced this for its simplicity, what is good for my goal of expanding it ;-)
I sent a PR with the little improvements i added:

Some new CLI option
An internal command line for allowing to load instruments and setting engine options. It could be extended with commands for loading scala files, etc.

Regards,

ephemer · 2021-09-08T15:03:44Z

I would be curious to hear of any further improvements possible for ARM machines. We plan to deploy sfizz on a variety of mobile (phone/tablet) devices, almost all of which are using ARM, either v7a or v8-a.

From what we can see for our use case, almost all of the sfizz CPU time is spent in the interpolation functions. That may be due to our sample-heavy instruments though – for synthesised instruments this may look quite different.

If there are low-hanging fruit updating the simde library to vectorise certain operations, this is something I would be interested at looking into. Admittedly, I don't have a huge amount of experience in the area though (I have a lot more experience using slightly higher-level SIMD primitives, e.g. in Swift).

paulfd · 2021-10-30T20:41:51Z

There may very well be low hanging fruits in the interpolation methods, even on SSE. On ARM you also have some cost, sometimes, when switching from float to integer to float operations unnecessarily. I think it is less true in ARMv8 though. The interpolation function themselves should be quite easy to benchmark. The challenge would be if there are slowdowns in the looping code which is quite complex (basically it's the part of the code that is responsible for finding the sample indices that are going to be interpolated all at once later on).

paulfd added the improvement Improve on existing functionality label Oct 14, 2021

paulfd self-assigned this Oct 14, 2021

paulfd added this to the 1.2.0 milestone Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on ARM (RBPi4) #974

Performance on ARM (RBPi4) #974

jofemodo commented Aug 20, 2021

paulfd commented Aug 23, 2021

jpcima commented Aug 23, 2021

jofemodo commented Aug 24, 2021

jofemodo commented Aug 24, 2021

ephemer commented Sep 8, 2021

paulfd commented Oct 30, 2021

Performance on ARM (RBPi4) #974

Performance on ARM (RBPi4) #974

Comments

jofemodo commented Aug 20, 2021

paulfd commented Aug 23, 2021

jpcima commented Aug 23, 2021

jofemodo commented Aug 24, 2021

jofemodo commented Aug 24, 2021

ephemer commented Sep 8, 2021

paulfd commented Oct 30, 2021