Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What do we want the Core language to be? #83

Open
effectfully opened this issue Aug 29, 2020 · 4 comments
Open

What do we want the Core language to be? #83

effectfully opened this issue Aug 29, 2020 · 4 comments
Labels

Comments

@effectfully
Copy link
Owner

Currently we have a "typed" language (it's a GADT at the moment, but we could potentially change that, but I don't see why bother).

I believe the current language already accurately reflects what a source langauge for compilation to the R1CS/Sonic constraint system has to be. I also believe that the design space is really narrow and there aren't many opportunities for design choices.

However @jakzale mentioned one possibility and it's having a single assert storing a boolean expression in CNF.

Pros:

  • a very sound foundation. CNF is something that is studied from all possible angles, we now how to produce expressions in that form, how to optimize etc
  • hence more optimization opportunities

Cons:

  • the prover wants to fail as early as possible when it attempts to produce a witness in an erroneous way. This suggests interleaving lets and asserts and calling each assert as early as possible. It's not clear how critical it would be if we deferred all calls to assert until every single let is computed as we don't care about the performance of the prover that much (we mostly care about the performance of the verifier), but it does feel wrong to introduce such an inefficiency just for the sake of having a more minimalistic system. Interleaved lets and asserts do reflect better the nature of the computation that the prover is supposed to perform
  • optimization opportunities are mostly in the form of common subexpression elimination (CSE). Programmers are already good at this kind of thing. Moreover, the programmer knows better what kind of subexpression they want to share. Consider for example group addition on the Jubjub curve in Edwards form, in R1CS it looks like this:
(A) = (x1 + y1) × (x2 + y2)
(B) = (x1) × (y2)
(C) = (x2) × (y1)
(D) = (dj · B) × (C)
(B + C) = (1 + D) × (x3)
(A − B − C) = (1 − D) × (y3)

But in Zonk I implemented it like this (where / is supposed to be compiled without ensuring that the divisor is zero, because it can't be zero):

let x1y2 = x1 * y2;
let x2y1 = x2 * y1;
let x1x2 = x1 * x2;
let y1y2 = y1 * y2;
let prod = x1x2 * y1y2;
let x1 = (x1y2 + x2y1) / (1 + dj * prod);
let y1 = (y1y2 + x1x2) / (1 - dj * prod);

R1CS and Sonic are extremely similar, almost identical, yet I had to completely rewrite the whole thing in order to make compilation efficient. It looks like we'd need to implement a specific form of CSE for each possible zero knowledge backend, which is a lot of highly non-trivial research and development just to solve a task that nobody asked us to solve, as the programmer writing the actual code in Zonk likely already knows what expressions make bindings.

Note also that sometimes we don't want to perform CSE over certain subexpressions. For example, the following program:

let x = a + b + c;
let y = a + b + d;
let out = x * y;

can be more efficient than if we bind a + b to a variable as that would increase the number of constraints. Although given that we're supposed to produce compressed Sonic vectors it's not really clear if having an additional constraint here will make things slower, but then why commit to a single way of handling things, if it's not clear which option is better. The programmer can decide for themselves, particularly given the fact that efficiency concerns are all backend-specific.

Moreover if we did this single assert thing, the output of the first few stages of compilation would basically be obfuscated and completely indecipherable for any non-hello-world program. The programmer writes a program, introduces certain names and logic and we just inline everything and then come up with our own bindings and their bodies, not related in any way to the original program. That would make any kind of visual inspection of intermediate programs completely infeasible, i.e. we'd be less likely to catch an error if it appears somewhere in the logic of the compiler.

And our language is fairly low-level. The programmer should be given full control over what expressions stay bound to variables, what bindings get inlined. I believe that writing an algorithm that consistently outperforms a dedicated human being requires plenty of research and even if we had one, it'd have to be optional. Efficiency (in terms of the number of constraints in the final result) is critically important as that determines how much time is spent on verification and we do want to allow the programmer to fine-tune the output for their specific needs in a predictable way.

Finally, we simply have a ton of quite more vital stuff to do and don't have enough bandwidth for any non-essential and at the same time non-trivial research.

So I believe we shouldn't even attempt investigating this single assert idea any time soon.

@emmorais
Copy link

I agree that since it is a low-level language it makes sense to give full control to programmer. Simplicity is important here. If the programmer can implement the optimization himself, then a simpler compiler seems crucial, at least at this stage, and in particular because of mentioned cons. The compiler will provide privacy, but the programmer should be the one responsible for performance.

@effectfully
Copy link
Owner Author

effectfully commented Sep 1, 2020

@jakzale asked about compilation of the following expression:

if P then let x : A = N in M else let y : B = N' in M'

which made me think about lets in local scopes. Here are my conclusions:

  1. we want let-expressions no matter what, 'cause pure functions without let-expressions are nearly useless
  2. the semantics of a let-expression should be as follows: if we inline the let, we should get a program with the same semantics as the original program having the let (modulo effects as we're not in a non-strict language)
  3. the old compiler did allow lets inside expressions and moreover it did compile them correctly as I'll shortly describe
  4. we can carefully craft a non-inlinable let inside a local scope with the old compiler and see what happens. This program:
if ?b then (let x = a * b; x + 1) else 0

compiles to the following IR:

.m_5 = a_2 * b_3  -- `x` got replaced by a generated `.m_5`
.l_6 = - 1 - .m_5  -- negated `1 + .m_5`
.m_7 = ?b_0 * .l_6
out_4 = - .m_7  -- the result is either negated `.l_6` (which is itself negated), which is what `x + 1` got compiled to -- when `?b` is `1` (i.e. true); or the result is `0` when `?b` is `0` (i.e. false)
?b_0 * ?b_0 =?= ?b_0  -- ensuring `?b` is either `1` or `0` (i.e. boolean)

The important part here is that the let was "pulled out" of the if like if it's always going to be evaluated as opposed to being evaluated only when ?b is true. How is it sensible? Because we always evaluate both the branches of an if! That's the biggest peculiarity of the setting (and the main reason why Zonk can't be a general-purpose language) -- both the branches of an if get evaluated by the prover, always.

What this means is that we are allowed to have a let in a "local scope" as there's no notion of a local scope w.r.t. computation in this setting. Everything written always gets evaluated (some things get reduced at compile time, others at "runtime" by the prover).

Moreover, allowing lets in local scopes does not make the task of compiling programs any bigger. It's a conservative extension. We could simply inline them and then the compiler would stumble upon some inlined expressions and compile every multiplication as a separate assignment at the top level. I.e. "local" lets become global through inlining, so it's perfectly fine not to inline them as that has literally the same semantics: local lets become global.

So it's perfectly fine to nest lets in any way (e.g. a let inside an RHS of a let-binding is also fine).

But of course the fact that we can compile such programs does not mean that we have to allow the user to compile them. In

if P then let x : A = N in M else let y : B = N' in M'

is x in scope in else? The compiler says "yes" (unless it decides to reorder the multiplication that an if compiles to), but that's really an implementation detail and the user shouldn't be bothered with it. It makes perfect sense to make the type checker reject such weirdly scoped programs even if they can be compiled in a sensible way. In particular, we can make Zonk Core lexically scoped and just refuse to compile non-lexically-scoped programs (even though we could compile some of them). Making the type checker reject some programs that can be compiled sensibly is not uncommon, so we can do that.

So there's no notion of local scope for a let. Unfortunately, the same is not true for an assert. assert inside a branch of an if is a scary thing that can't be compiled efficiently in the general case:

The thing is, Q in assert Q and Q in assert not P or Q compile very differently. In particular, consider Q = x == y. When we compile assert Q we only need to enforce that x is equal to y. When we compile assert not P or Q we need to check if x is equal to y and assert the result to be True. And checking is way more expensive than requiring.

As far as I'm aware there's no or defined for general constraints, only for boolean expressions. Which means that when we encounter an or during compilation of an assertion we have to fallback to compiling the assertion as an expression and enforcing that the result is True, which gives us less efficient compilation.

This is discussed (not in a very detailed manner) in this issue

BTW, some Qs can stop being compilable if treated as expressions. I don't think we have those now, but we certainly may in future (for example if we support range checks with negative values, but don't support comparisons with negative values)

So it still makes sense to distinguish between the top level and local scopes as assertions can only be compiled efficiently at the top level. Of course we also have the problem of compiling division in local scopes as that operation may fail, i.e. it's effectful in the same way as assert, but even if we end up compiling division in local scopes inefficiently (there are lots of alternative approaches), it does not mean we should allow to implicitly put arbitrary assertions in local scopes and silently compile all of those inefficiently. I do believe we should support some explicit way of putting assertions in local scopes, though, as in

If we have an Assert monad, we can actually make / run in it. And if the user doesn't want that, we can provide a function expensive : Assert a -> a that allows to embed any kind of assertion into a pure term at the expense of having more constraints in the output. That would constitute a fairly good UX as it'd allow the user to explicitly choose between efficiency and generality.

Summarizing: we should allow

  • lets everywhere
  • to put asserts easily only at the top level
  • to have some cautionary way of putting asserts in local scopes

@effectfully
Copy link
Owner Author

to have some cautionary way of putting asserts in local scopes

Note however that it's not necessary as we don't lose any generality by not supporting this. Any kind of general strategy that we could implement for compiling asserts in local scopes can be outperformed by a dedicated user implementing something tailored for their particular use case. And our users, at least in the middle term, do know how to do such things, and much better than we do.

And when it comes to division we can provide a /' operator that doesn't ensure its right operand is non-zero. Then the user can implement whatever logic they find sensible.

We do lose some convenience by not supporting asserts in local scopes, though. Sometimes you don't care about efficiency and only want to concoct something quickly. But given that we're going to implement already specified algorithms, I don't think it's too important in our case to have this convenience.

And obviously compiling assertions in local scopes efficiently requires handling a myriad of corner cases, so plenty of non-trivial research that nobody asked us to perform.

So I believe we either shouldn't bother with asserts in local scopes at all for the moment being or just use the most general inefficient strategy of compiling them once we realize that we do find them useful for something.

@effectfully
Copy link
Owner Author

Quoting a slack discussion (just to have it documented here):

I've been thinking about asserts in local scopes and it occurred to me that there may exist statements that can't be compiled as expressions at all. For the current core language for any assert B we can create assert C || B and compile that inefficiently, but in general that may not be the case.
For example, we may have additional primitives like a shuffle gadget and I'm not looking forward to thinking on whether building a sorting network manually can help in the general case.

Another example is range checks with negative numbers. We can potentially compile

assert v > -7

via

assert v + 7 > 0

but that same logic does not work with comparisons, because requiring v + 7 to be > 0 and failing if it's not is way different to checking if it is and still proceeding with proving if it's not. I was explicitly told by our crypto people not to bother with negative values in comparisons after struggling to implement them for some weeks.

That only applies to the Core language. Whether supporting asserts in local scopes in a high-level language is sensible or not (provided compilation removes them, so they don't leak into Core) is a different topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants