Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: compute_eval_table_sparse + combine_evals #1

Open
sragss opened this issue Jan 17, 2024 · 4 comments
Open

Optimization: compute_eval_table_sparse + combine_evals #1

sragss opened this issue Jan 17, 2024 · 4 comments

Comments

@sragss
Copy link
Collaborator

sragss commented Jan 17, 2024

https://github.com/a16z/Spartan2/blob/uniform_r1cs_shape/src/spartan/snark.rs#L238-L285

The compute_eval_table_sparse lambda and the following combination of the evaluations comprise ~12.5% of the end-to-end Spartan2 flow.

I see three primary performance bugs.

  • Parallelism: Currently compute_eval_table sparse is only parallel across the 3 A / B / C matrices. Most machines have more than 3 threads, thus we should increase the parallelism available.
  • Evaluation combination ordering. The combined_evals a + r * b + r * r * c can be inlined such that the values a / b / c are used directly after computation rather than storing, throwing away, and reloading from RAM.
  • r^2: During the evaluation phase should not be computed for each run of the loop. It should be computed once upfront.
@sragss
Copy link
Collaborator Author

sragss commented Jan 18, 2024

Additionally I don't think that we need to write out the zero-ed A_evals, B_evals and C_evals before computing the weighted sum.

Zero allocation appears to cost 30-50% of the total looptime.

@sragss
Copy link
Collaborator Author

sragss commented Jan 19, 2024

Here is the density map of the A, B, C matrices for the jolt_single_step circuit:

A: ==================== 14139658
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 7336615 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3868397 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffff01, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 533572 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000001000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efff0001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffe, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000009
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593ef000001, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000001000000000000
Value: 0x0000000000000000000000000000000000000000000000000001000000000000, Count: 133393 sq: 0x0000000000000000000000000000000000000001000000000000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593efffffff, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000010000000000000000


B: ==================== 8670545
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 7203222 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 666965 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000100000000, Count: 400179 sq: 0x0000000000000000000000000000000000000000000000010000000000000000
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593effffffd, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010


C: ==================== 10671440
Value: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000, Count: 5469113 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000001, Count: 3068039 sq: 0x0000000000000000000000000000000000000000000000000000000000000001
Value: 0x0000000000000000000000000000000000000000000000000000000000000400, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000100000
Value: 0x0000000000000000000000000000000000000000000000000000000000008000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000040000000
Value: 0x0000000000000000000000000000000000000000000000000000000000010000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000100000000
Value: 0x0000000000000000000000000000000000000000000000000000000000004000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000010000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000080, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000004000
Value: 0x0000000000000000000000000000000000000000000000000000000000000040, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000001000
Value: 0x0000000000000000000000000000000000000000000000000000000000000100, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000010000
Value: 0x0000000000000000000000000000000000000000000000000000000000000800, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000400000
Value: 0x0000000000000000000000000000000000000000000000000000000000001000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000001000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000020, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000400
Value: 0x0000000000000000000000000000000000000000000000000000000000000004, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000010
Value: 0x0000000000000000000000000000000000000000000000000000000000002000, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000004000000
Value: 0x0000000000000000000000000000000000000000000000000000000000000002, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000004
Value: 0x0000000000000000000000000000000000000000000000000000000000000010, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000100
Value: 0x0000000000000000000000000000000000000000000000000000000000000200, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000040000
Value: 0x0000000000000000000000000000000000000000000000000000000000000008, Count: 133393 sq: 0x0000000000000000000000000000000000000000000000000000000000000040

@sragss
Copy link
Collaborator Author

sragss commented Jan 19, 2024

Guessing this would be faster if we could order by pk.S.{A,B,C} by (col, row) rather than (row, col). This would allow splitting the iterator such that each chunk largely access only its own columns. Only minimal contention at the edges.

@sragss
Copy link
Collaborator Author

sragss commented Jan 19, 2024

Srinath does not think the code depends on pk.S.{A,B,C} (row, col) ordering, suggesting we should attempt a change to (col, row) then check if memory contention on parallel compute_eval_table_sparse improves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant