Optimization: compute_eval_table_sparse + combine_evals #1

sragss · 2024-01-17T22:17:32Z

https://github.com/a16z/Spartan2/blob/uniform_r1cs_shape/src/spartan/snark.rs#L238-L285

The compute_eval_table_sparse lambda and the following combination of the evaluations comprise ~12.5% of the end-to-end Spartan2 flow.

I see three primary performance bugs.

Parallelism: Currently compute_eval_table sparse is only parallel across the 3 A / B / C matrices. Most machines have more than 3 threads, thus we should increase the parallelism available.
Evaluation combination ordering. The combined_evals a + r * b + r * r * c can be inlined such that the values a / b / c are used directly after computation rather than storing, throwing away, and reloading from RAM.
r^2: During the evaluation phase should not be computed for each run of the loop. It should be computed once upfront.

The text was updated successfully, but these errors were encountered:

sragss · 2024-01-18T00:26:45Z

Additionally I don't think that we need to write out the zero-ed A_evals, B_evals and C_evals before computing the weighted sum.

Zero allocation appears to cost 30-50% of the total looptime.

sragss · 2024-01-19T01:35:59Z

Here is the density map of the A, B, C matrices for the jolt_single_step circuit:

A: ==================== 14139658
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 7336615 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3868397 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffff01, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 533572 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000001000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efff0001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffe, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000009
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593ef000001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x0000000000000000000000000000000000000000000000000001000000000000, Count: 133393 sq: 0x0000000000000000000000000000000000000001000000000000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffffff, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000010000000000000000


B: ==================== 8670545
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 7203222 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000010000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffd, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010


C: ==================== 10671440
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 5469113 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3068039 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000400, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000100000
Value: 0x0000000000000000000000000000000000000000000000000000000000008000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000040000000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000004000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000010000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000080, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000004000
Value: 0x0000000000000000000000000000000000000000000000000000000000000040, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000001000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000000800, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000400000
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000020, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000400
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000002000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000004000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000000000010, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000100
Value: 0x0000000000000000000000000000000000000000000000000000000000000200, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000040000
Value: 0x0000000000000000000000000000000000000000000000000000000000000008, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000040

sragss · 2024-01-19T17:42:35Z

Guessing this would be faster if we could order by pk.S.{A,B,C} by (col, row) rather than (row, col). This would allow splitting the iterator such that each chunk largely access only its own columns. Only minimal contention at the edges.

sragss · 2024-01-19T17:52:18Z

Srinath does not think the code depends on pk.S.{A,B,C} (row, col) ordering, suggesting we should attempt a change to (col, row) then check if memory contention on parallel compute_eval_table_sparse improves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: compute_eval_table_sparse + combine_evals #1

Optimization: compute_eval_table_sparse + combine_evals #1

sragss commented Jan 17, 2024

sragss commented Jan 18, 2024

sragss commented Jan 19, 2024

sragss commented Jan 19, 2024

sragss commented Jan 19, 2024

Optimization: compute_eval_table_sparse + combine_evals #1

Optimization: compute_eval_table_sparse + combine_evals #1

Comments

sragss commented Jan 17, 2024

sragss commented Jan 18, 2024

sragss commented Jan 19, 2024

sragss commented Jan 19, 2024

sragss commented Jan 19, 2024