Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loops #54

Open
effectfully opened this issue Mar 25, 2020 · 16 comments · Fixed by #62 or #77
Open

Loops #54

effectfully opened this issue Mar 25, 2020 · 16 comments · Fixed by #62 or #77
Assignees

Comments

@effectfully
Copy link
Owner

Currently we unroll loops during parsing/typechecking. It's probably best to unroll them on the compiler side, since the compiler collects plenty of static info.

This means that we should try adding loops as primitives (i.e. constructors) to both the raw and the typed language. Loop bounds should be arbitrary expressions. The compiler will throw an error if it can't unroll a loop due to loop bounds not being static.

The previous issue talking about loops is #12.

@effectfully
Copy link
Owner Author

Loop bounds should be arbitrary expressions.

There's a major problem with this setup, though. Currently our strategy is to let the user write numbers and think of them as numbers, but lift those numbers into a finite field as well as all operations over them. This works fine as long as we don't need to "downcast" elements of the field into integers, which is what we do to compute comparisons, for example. This can be dangerous, for example, i * -1 <= 4 is false for i = 1.

In the same way it doesn't make much sense to downcast elements of the field to get integer bounds. No user expects 5 / 2 to evaluate to -26217937587563095239723870254092982918845276250263818911301829349969290592254.

Perhaps we want a separate type of integers that, say, all disappear during compilation. That could work for loop bounds at least (I doubt it's enough to get working comparisons, but I haven't spent any time investigating this issue).

@jakzale
Copy link
Collaborator

jakzale commented Mar 25, 2020

Maybe instead we should go for primitive recursion? Although I suspect that primitive recursion will be more difficult to use than loops.

What are loops actually used for in such language? Are they part of a spec?

@effectfully
Copy link
Owner Author

Maybe instead we should go for primitive recursion?

I don't have a feeling that it makes anything simpler, but who knows.

What are loops actually used for in such language? Are they part of a spec?

Kind of. See A.3.3.8 of the Sapling spec. We implemented that as ZKE.Field.Examples.ScalarMult.varBaseScalarMult. As you can see we need a loop there and we do use the loop variable.

Given that the language targets cryptographers we pretty much have to have loops, because this is what all those crypto algorithms use. Maybe just as a form of syntactic sugar on top of some kind of primitive recursion, though.

@effectfully
Copy link
Owner Author

This means that we should try adding loops as primitives (i.e. constructors) to both the raw and the typed language.

Seems to be done.

Loop bounds should be arbitrary expressions.

Tracked by #69.

i * -1 <= 4 is false for i = 1.

Should be fixed by #68 (the compiler is not updated yet, so can't check).

In the same way it doesn't make much sense to downcast elements of the field to get integer bounds. No user expects 5 / 2 to evaluate to -26217937587563095239723870254092982918845276250263818911301829349969290592254.

Perhaps we want a separate type of integers that, say, all disappear during compilation. That could work for loop bounds at least (I doubt it's enough to get working comparisons, but I haven't spent any time investigating this issue).

Not going to try that any time soon.

@jakzale jakzale linked a pull request Apr 21, 2020 that will close this issue
@effectfully
Copy link
Owner Author

effectfully commented May 16, 2020

I was thinking about loops and whether it's possible to have lexical scoping for them at all and I can't see a convenient way of making loops lexically scoped. Variables look to the user as if they were mutable, but internally every assignment creates a new variable, so they're immutable. So just declaring a variable outside of the loop and assigning it a value in the loop is not enough as it will create a new variable at each iteration. We could think of assigning the variable a value only once -- at the final iteration, but

  1. you can't put a let into an if
  2. even if you could, both branches of an if are evaluated, so that would still evaluate the assignment at each iteration

Although both of these problems can be fixed in this particular case, because statically computable things are supposed to have the usual semantics (if-expressions are lazy, for example). I haven't written a test for that statement yet, though (there's an issue in the compiler repo).

Another option would be to have a separate block in a for loop for things that happen after the loop finishes. Which is a bit weird, but could work.

Anyway, just being curious. Non-lexically scoped loops make sense in our case and that's what they use in the Sapling spec.

@effectfully
Copy link
Owner Author

One remaining problem with loops is what the value of i is after

for i = 1 to 10 do <...> end;

finishes. I like 11, @jakzale likes 10, but users will have arbitrary expectations, so such ambiguity is a problem regardless of what we choose in the end.

It perhaps would be best to remove i from the scope once the loop finishes.

BTW, what semantics, say,

for i = 1 to 10 do
    i = i + 1;
end;

is supposed to have?

@jakzale
Copy link
Collaborator

jakzale commented May 16, 2020

IMHO in the current implementation the cleanest semantics for for loops is as follows

  • make the upper bound exclusive, as in Python ranges, and
  • bind the variable after the loop to max start end.

@effectfully we do not really have variable assignments, just variable bindings, so the program will be written as

for i = 1 to 10 do
    let i = i + 1;
end;

Given the above, the semantics of the above loop are as follows:

  • the loop gets evaluated 9 times for [1,10), and
  • after the loop i is bound to 10.

Still, we should discuss about the following example

for i = 1 to 2 do
    let i = i + 2;
end;

as I am still unclear what i should be bound to after the loop.

I guess we could look into how SSA/LLVM handles for loops.

Finally, you mentioned that the compiler is using CPS, but as the language contains only bindings, it resembles SSA which is awfully similar to ANF which was touted as an alternative to CPS.

@effectfully
Copy link
Owner Author

IMHO in the current implementation the cleanest semantics for for loops is as follows

It's fine, but it doesn't matter if we go with this semantics or with some another one. What matters is to have loops that have an obvious semantics. Currently this is not the case. The easiest option would be just to dump for loops and use while loops, which have a very clear semantics, but then go explain to the user why you can't have an arbitrary computation in the loop header... Maybe we can reflect in the type of a variable whether the variable stores the result of a compile-time or runtime computation.

Or maybe we can use C-like for loops, which also have a clear semantics and forbid assigning anything to the loop variable in the body of the loop.

I'll think more on that, clearly there's a huge UX problem here.

we do not really have variable assignments, just variable bindings, so the program will be written as

Yeah, I just managed to forget the syntax.

Given the above, the semantics of the above loop are as follows:

  • the loop gets evaluated 9 times for [1,10), and
  • after the loop i is bound to 10.

That doesn't make any sense to me as I expect the loop counter to be incremented by virtue of it being a loop counter. I.e.

for i = 1 to 10 do end;

is already supposed to increment i automatically. So an additional i = i + 1 in the body should have some effect.

Finally, you mentioned that the compiler is using CPS, but as the language contains only bindings, it resembles SSA which is awfully similar to ANF which was touted as an alternative to CPS.

The compiler was using CPS for some technical stuff, but it was not the "compiling with CPS" technique, just some Cont sprinkled around to make certain things nicer much like we do in this repo. As of now, all Cont mentions are gone.

@jakzale
Copy link
Collaborator

jakzale commented May 18, 2020

Here is an interesting program, I suspect that is a bug in the renamer:

*** Failed! (after 646 tests and 23 shrinks):
        ERROR
          evaluation of the renamed program did not match the evaluation of the original program

        INITIAL STATE
          Env {unEnv = fromList [(26,Some 0)]}

        ORIGINAL PROGRAM
          for z_24 = 0 to -1 do
          let x_26 = x_26;
          end;
          let y_29 = if (0 == x_26) then 0 else 0;

        ORIGINAL PROGRAM FINAL STATE
          "[Some 0]"

        RENAMED PROGRAM
          for z_27 = 0 to -1 do
          let x_28 = x_26;
          end;
          let y_29 = if (0 == x_28) then 0 else 0;

        RENAMED PROGRAM FINAL STATE
          "VariableNotInScopeEvalError"

We probably should add an assignment operator, because at the moment renamer cannot tell apart a fresh binding with an assignment. We also need to unbind variables after for loop.

tack test --ta --quickcheck-replay=397135

@effectfully
Copy link
Owner Author

Err, this was closed automatically when I merge the PR.

Here is an interesting program, I suspect that is a bug in the renamer:

Brilliant find. Yes, it's a bug. The renamer assumes that the body of a loop will be evaluated at least once, which it's not the case here, so the renamer thinks that x_26 after the loop refers to the variable bound inside the loop, but it's in fact not. A natural consequence of not having lexical scopes...

So if we remove all the uniques and look at the program the way the user writes it:

for z = 0 to -1 do
  let x = x;
end;
let y = if (0 == x) then 0 else 0;

then it's clear that the last mention of x refers to a free variable, but that is only because the body of the loop does not get evaluated. If it was for z = 0 to 1, the last x would mean a different thing. We could implement such a logic in the renamer and fix the bug, but it's rather gross... But anyway, the bug should be fixed, so checking the bounds of a loop in the renamer is a sensible thing to do right now. I think I'll do that myself.

Maybe unrolling loops on the language side before calling the renamer was a good idea after all.

And in the long term, I wonder how necessary it is to rely on global uniqueness in the compiler and if it's possible to get away without requiring it. I'll think more on that.

@effectfully effectfully reopened this May 18, 2020
@effectfully
Copy link
Owner Author

We could implement such a logic in the renamer and fix the bug, but it's rather gross... But anyway, the bug should be fixed, so checking the bounds of a loop in the renamer is a sensible thing to do right now.

And clearly it has to be a temporary solution, 'cause handling things like

for i = 1 to 5 do
    for j = i to 5 do
        <...>
    end;
end;

this way in the renaming is just silly. So in the long run it's either

  1. unroll loops on the language side
  2. massage the compiler to accept programs with variables without uniques

It might actually make sense to do (1). The compiler has a lot of info and can inline things well, but maybe it's an anti-feature in this case, because it does not lead to reliable and predictable UX. Maybe inlining fewer things, but in a predictable way is a better strategy.

And then this solves the problem of the loop variable being in scope after the loop, as unrolling a loop amounts to inlining the value of the loop variable in the body of the loop, i.e. we do not create lets for the loop variable, and hence it's simply not in scope after the loop.

And the compiler does not need to generate any fresh variable names. It's quite appealing to move back to unrolling loops on the language side...

@jakzale
Copy link
Collaborator

jakzale commented May 28, 2020

I have a suspicion that we conflate the notion of binding and assignment, because they are used interchangeably with others.

For instance, consider the following program that sums all bits in a n bit vector v

let acc = 0;
for i = 0 to n do
    let acc = if v[i] then acc else acc + 1;
end;

Now it is unclear what the value of acc should be at the end, because according to lexical scoping rules acc should have value 0 at the end.

Whereas, I propose to introduce an assignment operator <-, using which we could write the above program as

let acc = 0;
for i = 0 to n do
    acc <- if v[i] then acc else acc + 1;
end;

which I belive that it is much readable.

Finally, I would want to write the following program

let mutable acc = 0

for i = 0 to n do
    if v[i] then
        acc <- acc + 1

@jakzale
Copy link
Collaborator

jakzale commented Jun 3, 2020

For now we decided to unroll loop in the front-end of the compiler.

jakzale pushed a commit that referenced this issue Jun 5, 2020
Fix #54 by removing for loops from Field.Typed language and unrolling loops in the front-end of the compiler.
@effectfully
Copy link
Owner Author

Apologies for not responding earlier, I somehow missed this particular comment.

Finally, I would want to write the following program

let mutable acc = 0

for i = 0 to n do
    if v[i] then
        acc <- acc + 1

We don't allow statements inside an if and I don't know how to compile such a thing generally.

Now it is unclear what the value of acc should be at the end, because according to lexical scoping rules acc should have value 0 at the end.

let acc = 0;
for i = 0 to n do
    let acc = if v[i] then acc else acc + 1;
end;

It should be the value of the latest acc. This what the Sapling spec says and the Sapling spec is our Bible. We could do things against the Sapling spec if there are some strong reasons, though.

But using the exact syntax that the Sapling spec uses and giving it a different meaning is a no go as it would be extremely confusing for our users. I do like the syntax that you proposed, but it just doesn't fit in the context we develop this whole thing in.

Now about the general idea.

Let's say that let means what it means currently (i.e. your let + <- at the same time) and we add a local keyword or any other way to indicate that a binding is a local one. Then if I understand your suggestion, the following snippet:

let acc = 0;
for i = 0 to 1 do
    let local acc = if v[i] then acc else acc + 1;
end;
assert acc == 0;
let x = 0;

must not unfold into

let acc = 0;
let acc = if v[0] then acc else acc + 1;
let acc = if v[1] then acc else acc + 1;
assert acc == 0;
let x = 0;

because then the final acc would reference a local one.

But we do unfold loops and since we have to distinguish between the local and global accs, we should unfold the above program into

let acc_0 = 0;
let acc_1 = if v[0] then acc_0 else acc_0 + 1;
let acc_2 = if v[1] then acc_1 else acc_1 + 1;
assert acc_0 == 0;
let x_3 = 0;

How will we get there? We can't rename any expression or generally resolve scopes before unfolding loops, because

for i = 0 to n do
    let acc = if v[i] then acc else acc + 1;
end
assert acc = 0;

(the let binds a global variable inside a for loop) is a valid program in the Sapling spec and what it means depends on the value of n, i.e. we can't rename the program before unrolling the loop. And we can't unroll the loop before doing renaming as shown above. So it seems we have to do both at the same time, which is silly.

We wouldn't have this problem if it was impossible to declare global variables in a loop body, but this is what the Sapling spec has. Recall that a loop is a just a macro that gets unfolded at some point, it's not a separate construct with its own scope.

Overall,

  1. I do think that what you propose is better than what they have in the Sapling spec. We can decouple Zonk from the language from that spec, but we need to make sure that the syntax differs then, so that there's no confusion
  2. there's a big question of whether having the ability to introduce global bindings in a loop body is important or not. We'll be able to come up with a conclusion on that once we implement a fair amount of code from the spec
  3. if it was only loops, I wouldn't be bothered and just built Zonk directly on top of the language from the Sapling. But we'll also have procedures at some point and all the same question seem to apply to them

My conclusions are

  1. we should stick to the Sapling spec until we understand exactly what needs to be expressible in the language
  2. hence we shouldn't implement assignments right now
  3. and we shouldn't have lexical scoping for loops. It's good not to have the loop variable bound after the loop finishes (although if the Sapling spec says anything about it, we should do what it says instead), but variables introduced in the body should be global
  4. we should reconsider all of that after implementing lots of stuff from the Sapling spec

@effectfully effectfully reopened this Jun 11, 2020
@effectfully effectfully mentioned this issue Jun 11, 2020
7 tasks
@effectfully
Copy link
Owner Author

Quoting my comment:

we now also resolve scopes in the type checker in addition to resolving them in the renamer. And we can't apply the renamer beforehand, because of loops.

Would the best way be to

  1. unroll loops
  2. rename (after moving the renamer to operate on untyped terms rather than typed ones)
  3. type check

But maybe it's totally fine to track scopes in the type checker and in the renamer. After all, we track them in the evaluator, so why not also do that in the type checker. And relying on variables being globally unique in the type checker would be weird and annoying. And if we don't rely on that, then we're effectively tracking scopes.

So yeah, the simplest thing is unroll loops and type check things at the same time and rename afterwards. This is already what is happening, so I'm just convincing myself that this is the right approach here.

@effectfully
Copy link
Owner Author

effectfully commented Nov 18, 2020

For a long time we had this problem that loops were non-lexically scoped, because that's what they had in the Sapling spec (dunno if they still have non-lexical scoped loops there, but that's now irrelevant) and I did not want to contradict the only source of truth we had. After the recent discussion we decided to move to lexically-scoped loops.

I'm going to use a slightly different syntax than the one we've been using so far. Let var x; be a variable declaration and x := e; be an assignment, with var x := e; being a shorthand for

var x;
x := e;

Now having lexically-scoped loops means that the following code is ill-typed (by means of being ill-scoped):

for i = 1 to 1 do
  var x := 0;
end;
assert x == 0;

because the x variable is introduced in the body of the loop and is thus out of scope after the loop in the assert statement.

The right way to write that program is by declaring x before the loop:

var x;
for i = 1 to 1 do
  x := 0;
end;
assert x == 0;

This all is great, but making loops lexically-scoped poses some minor challenges. Consider the following program that finds the sum of squares of natural numbers from 1 to 2 (the short range is for simplicity):

external res;
var s := 0;
var x := 0;        -- [1]
for i = 1 to 2 do
  x := x + 1;      -- [2]
  var x := x * x;  -- [3]
  s := s + x;      -- [4]
end;
assert s == res;

Here x declared before the loop gets shadowed by x declared in the body of the loop. On the first iteration of the loop x in [2] refers to [1] and x in [4] refers to [3]. And on the second iteration of the loop the same still holds. [3] only shadows x until the end of the loop, when the next iteration starts, x refers back to [1]

If we naively compile this IR program to Core, we'll get

external res;
var s = 0;
var x = 0;
var x = x + 1;
var x = x * x;  -- [1]
var s = s + x;  -- [2]
var x = x + 1;
var x = x * x;
var s = s + x;
assert s == res;

Here the local x shadows the global one in [1] for the rest of the program, not just until [2] like it's supposed to.

Correct compilation should give us something like this:

external res;
let s = 0;
let x = 0;
let x = x + 1;
let x' = x * x;
let s = s + x';
let x = x + 1;
let x' = x * x;
let s = s + x';
assert s == res;

Here the local x gets renamed to x' to avoid shadowing problems.

There are a number of ways we could approach this problem:

  1. forbid shadowing. As much as I'm not a fan of shadowing, I think completely forbidding it is too much
  2. forbid local variable declarations. I think it'd be a weird restriction and I was always annoyed by it back when I was programming in Delphi 7
  3. bite the bullet and just implement the renaming

I'm strongly in favor of 3. How to actually implement the renaming is a pretty technical question and not a hard one (just the usual Uniques stuff). However I feel like we could benefit from introducing a concept of lexically-scoped "block" and instead of compiling loops and renaming variables at the same time, unroll the program above to something like

external res;
var s := 0;
var x := 0;
block
  x := x + 1;
  var x := x * x;
  s := s + x;
end;
block
  x := x + 1;
  var x := x * x;
  s := s + x;
end;
assert s == res;

and only then inline the blocks and rename variable as appropriate. This is easier to get right and easier to debug than if we do everything at the same time. Plus, if we're going to have lexically-scoped procedures (which are kinda like generalized gadgets), then we will also be able to inline them directly as blocks and worry about scoping separately.

So this ask for a tree-like structure of statements in the IR, which we flatten into a list-like structure during compilation to Core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants