Improve latency of matrix-`exp` #998

thchr · 2022-02-21T16:30:09Z

As discussed in #985 (comment), exp is very slow to compile for larger StaticArrays. This tries to reduce the latency a bit by just pulling out code into smaller functions, aiming to reduce the amount of inlined code.

There's little difference in the latency at small matrix sizes - and a slight reduction in performance for 3×3 matrices. For all other sizes, this PR actually produces a clear a performance improvement however. The latency improvements at larger sizes are also pronounced. Still, 3×3 matrices are definitely more important than bigger matrices, so it's not a no-brainer I guess - but I wanted to put the PR up for discussion anyway.

The summary of timings before/after this PR are (the performance timings depend on which branch is taken; also, this is on v1.7.2):

Size	`@time` (1st)	`@btime` (branch 1)	`@btime` (branch 2)	Latency win	Performance win
3×3	1.2 s/1.3 s	62 ns/77 ns	68 ns/73 ns	÷	÷
4×4	2.9 s/3.2 s	208 ns/201 ns	194 ns/161 ns	÷	✓
10×10	46 s/21 s	1.77 μs/1.62 μs	8.43 μs/5.80 μs	✓	✓
15×15	47 min/4.5 min	?/11.9 μs	?/37-50 μs	✓	?

mateuszbaran · 2022-02-21T19:58:52Z

I wonder how much compilation time reduction is just due to not duplicating U = A*U in each if branch? Maybe we don't have to decrease performance for the 3x3 case. Also, we could use the old variant for size 3x3 and the new one for larger matrices.

thchr · 2022-02-21T21:29:41Z

I tested various permutations of this quite extensively yesterday (pretty sure I also tested the A*U shuffle).
Puzzlingly, I found that almost all the compilation time comes from the presence of the if and for statements that rescale A in the second branch: without those, compilation time is drastically reduced. I don't understand why they make such a difference.

Alternatively, we could also use the new in-function @inline in 1.8 to force inlining in the 3x3 case (to avoid code duplication). Just having the code duplicated would indeed also be a solution.

c42f · 2022-02-22T00:39:11Z

almost all the compilation time comes from the presence of the two loops that rescale A in the second branch: without those, compilation time is drastically reduced

Interesting, though confusing! Did you try moving just that part out into a separate function?

improve latency of matrix-exp

76f9157

thchr force-pushed the exp-compile branch from d50f9da to 76f9157 Compare February 28, 2022 20:31

thchr mentioned this pull request Feb 28, 2022

Test error on Julia nightly (1.9) #999

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve latency of matrix-`exp` #998

Improve latency of matrix-`exp` #998

thchr commented Feb 21, 2022 •

edited

Loading

mateuszbaran commented Feb 21, 2022

thchr commented Feb 21, 2022 •

edited

Loading

c42f commented Feb 22, 2022

Improve latency of matrix-exp #998

Are you sure you want to change the base?

Improve latency of matrix-exp #998

Conversation

thchr commented Feb 21, 2022 • edited Loading

mateuszbaran commented Feb 21, 2022

thchr commented Feb 21, 2022 • edited Loading

c42f commented Feb 22, 2022

Improve latency of matrix-`exp` #998

Improve latency of matrix-`exp` #998

thchr commented Feb 21, 2022 •

edited

Loading

thchr commented Feb 21, 2022 •

edited

Loading