-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validate_record
API to upgraded contexts
#1218
Add validate_record
API to upgraded contexts
#1218
Conversation
This API serves multiple purposes: ## We could never make the existing API work and kept discovering bugs associated with it. Previously, the validation process was separate, detached from actual protocol and validated everything at once. We saw multiple occurrences where reveal was called before sharings were validated. ## Previous approach did not integrate smoothly with vectorization Vectorization makes data processed in chunks. Going between different chunk sizes (one for data, one for validation) has proven to be challenging. The code written for that was hard to read. ## Validate record API The core of this proposal is to put `validate_record` API on the `UpgradedContext` that takes a `record_id` and blocks the execution until this record (and others in the same batch) has been validated. FWIW this is exactly how ZKP validator works now. This API allows to bring closer together MAC and ZKP validation. In addition to bringing this API, this change also updates all the uses of MAC validators and contexts to use it. The pros include: * Validate record now must be called per record basis, making protocols easier to review. One can see that validate call is there and it is right before the reveal. * Reveal can have special power and abort if the record being revealed hasn't been validated yet. Because `UpgradedContext` now can keep track of things validated, we can add this functionality later. * No chunk conversion required on the protocol side. They can be simply written w/o doing magic conversions * Validation can now be done in batches, transparently to the code calling `validate_record` and it integrates smoothly with `seq_join` (no need for special `validated_seq_join`) Downsides: * Tracking total records is difficult. We still don't have a good solution to some of our protocols where we need to have different total records per step, created inside `UpgradedContext` * `record_id` appears more and more in the API. Traits like `ShareKnownValue` must be updated to support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I captured a couple things that we may want to reconcile with the DZKP APIs in #1219, feel free to add to the list.
} | ||
|
||
fn r_share(&self, record_id: RecordId) -> Replicated<F::ExtendedField> { | ||
// its unfortunate, but carrying references across mutex boundaries is not possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. Is it saying that it's unfortunate the with_batch
call is necessary here?
In the old code there was a comment about r_share
being intentionally private, I think it might be worth maintaining something like it in the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea before we could return a reference, now we must clone each time we call r_share
. that's unfortunate, but there is no other way.
pub(super) batch: B, | ||
pub(super) notify: Arc<Notify>, | ||
records_per_batch: usize, | ||
records: AtomicUsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this atomic ended up being unnecessary, or at least, not useful given that the batches are anyways behind a mutex. I'm fine if we merge this as is and clean that up later, though. (Besides cleanup of a possibly unnecessary atomic, there's also a question of resolving code duplication with the DZKP validator version of this code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, to keep others informed - I think this interface ended up not working properly for mac validation at least. There is no mechanism for records failing validation to abort the pending futures - in this case they get resolved and execution resumes for them. If those are behind seq_join
then they get cancelled, but the damage can be done already.
So this requires a major overhaul, I agree. Lets clean this up in one PR if that's possible
// send our `u_i+1` value to the helper on the right | ||
let (u_share, w_share) = self.propagate_u_and_w().await?; | ||
|
||
// This should probably be done in parallel with the futures above | ||
let narrow_ctx = self | ||
.validate_ctx | ||
.narrow(&ValidateStep::RevealR) | ||
.set_total_records(TotalRecords::ONE); | ||
.set_total_records(TotalRecords::Indeterminate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would it take to specify total records here? (From a performance perspective it probably doesn't matter, but it would be nice to eliminate uses of TotalRecords::Indeterminate
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see how can we make this work with the current limitation (one active batch). The problem here is that we require data to be communicated right away - even if we could set the total records correctly using total_records / batch_size
, we can't make progress until the first batch is validated.
We could probably narrow to a unique step per batch. Will likely stress out our compact gate system as we would require ~50k unique steps to validate 50M records.
If we have multi-batch support, then I guess we can set up total_records
for RevealR
and validate all batches at the same time - that will likely mean too many pending futures at the time of validation.
The most reasonable approach to eliminate Indeterminate
records would be to support custom batch size per step in compact gate macro. That way, we can set the total records correctly and batch size to 1
for RevealR
and CheckZero
steps
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1218 +/- ##
==========================================
+ Coverage 92.69% 92.77% +0.08%
==========================================
Files 196 198 +2
Lines 30503 30781 +278
==========================================
+ Hits 28274 28558 +284
+ Misses 2229 2223 -6 ☔ View full report in Codecov by Sentry. |
This API serves multiple purposes:
We could never make the existing API work and kept discovering bugs associated with it.
Previously, the validation process was separate, detached from actual protocol and validated everything at once. We saw multiple occurrences where reveal was called before sharings were validated.
Previous approach did not integrate smoothly with vectorization
Vectorization makes data processed in chunks. Going between different chunk sizes (one for data, one for validation) has proven to be challenging. The code written for that was hard to read.
Validate record API
The core of this proposal is to put
validate_record
API on theUpgradedContext
that takes arecord_id
and blocks the execution until this record (and others in the same batch) has been validated. FWIW this is exactly how ZKP validator works now.This API allows to bring closer together MAC and ZKP validation.
In addition to bringing this API, this change also updates all the uses of MAC validators and contexts to use it.
The pros include:
UpgradedContext
now can keep track of things validated, we can add this functionality later.validate_record
and it integrates smoothly withseq_join
(no need for specialvalidated_seq_join
)Downsides:
UpgradedContext
record_id
appears more and more in the API. Traits likeShareKnownValue
must be updated to support it.