Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Parser/sizeof_expr_ok fails #44

Open
lerno opened this issue Oct 29, 2018 · 15 comments
Open

Test Parser/sizeof_expr_ok fails #44

lerno opened this issue Oct 29, 2018 · 15 comments
Labels

Comments

@lerno
Copy link
Collaborator

lerno commented Oct 29, 2018

The test has this:

i32 a = sizeof(i32*)
i32 c = sizeof(a*);

This fails (correctly) since a is not a type (sizeof(i32*) succeeds as expected). However, one might have expected this to be a typo, where we wanted &a instead of a*. Sadly sizeof(&a) does not seem to work.

Also, there's a very valid case of this code:

i32* x = nil;
i32 y = sizeof(x*);

This does not work either. In the &a case, '&' is probably incorrectly parsed by C2Parser::ParseSizeof, but the * case is more complex. Obviously we can't do a simple "replace x by its type" when resolving types, because that would allow the analyser to happily parse x* a = 23 as equal to i32 a = 32!

Unfortunately I don't know the code sufficiently well to offer a fix that I'm comfortable with.

@bvdberg
Copy link
Member

bvdberg commented Oct 30, 2018

parsing sizeof() in C2 has proven to be quite nasty for the following reason:

There are basically 2 ways to use sizeof: sizeof(type) or sizeof(variable). In the second case, the
type is induced from the given variable. But during parsing into the AST, the compiler doesn't know whether a symbol is a type or a variable (unless we use the fact that types start with a capital char). So it's unclear how to store this into the AST. There is currently no TypeOrNameExpr thing...

I think both uses should be valid, so the compiler will have to deal with this situation.

@bvdberg
Copy link
Member

bvdberg commented Oct 30, 2018

added to wiki/Roadmap

@lerno
Copy link
Collaborator Author

lerno commented Oct 30, 2018

Another issue reading the code is that it's unclear (when I read the code) what information is encoded in the AST during different analyzer passes.

Reading the code it looks like just having sizeof triggers type resolution at some places, which is a bit hairy. Sizeof is basically a very early compile-time macro resolution, and this problem will be greater with macros.

As a first step, could you document what each analyzer does and how they are layered?

@bvdberg
Copy link
Member

bvdberg commented Oct 30, 2018

Basically on the parsing pass (when creating the AST), all information is stored that is needed for analysis. This might seem obvious, so an example:

a = 10;
This is recognized as an BinaryExpr (with operator '='), but what a is, is unknown. So this is stored as an IdentifierExpr with the name 'a'. During analysis, the name is checked and if it is found to be good, the IdentifierExpr is updated with a pointer to a Decl.

Something similar is done for all expressions. Since the type is not known at the moment of AST creation, it is set during analysis. The setting of the type and the resolving of IdentifierExpr's are the
main to changes to the AST during analysis

@lerno
Copy link
Collaborator Author

lerno commented Oct 30, 2018

Can we some document outlining all passes on the AST and what's resolved / not resolved after each pass? Maybe introduce an explicit state so that we can assert early. For example, if we are in a state that should have resolved all literals, then we add plenty of more asserts throughout the code. Because currently some unfinished code actually leaves parts of the AST unresolved, which causes bugs further down the line. When untangling the issue it's not clear where the problem originated.

Perhaps the wiki could just add a very simple explanation like

x. XXXXXAnalysis "does this and that"
x+1. LiteralAnalysis, resolves all literals
x+2. XXXXXXXAnalysis "does something else"

@lerno
Copy link
Collaborator Author

lerno commented Nov 3, 2018

I have some code where the state of the file analysis class is very clearly set with assertions to show what code can be called when. It's in a branch that depends on the "no_clang" changes though.

@lerno
Copy link
Collaborator Author

lerno commented Nov 3, 2018

For the sizeof problem, we should actually kick that over a standard "typeOf" expression, that will resolve the ambiguity:

sizeof(typeof(a*))
sizeof(Handle*)

Here again it might be better to use a prefix for compile time resolution so it's clear:

@sizeof(@typeof(a*))
@sizeof(i32*)

@lerno
Copy link
Collaborator Author

lerno commented Nov 3, 2018

Another way would be to use sizeof(type i32*) again that resolves the ambiguity in the grammar.

@bvdberg
Copy link
Member

bvdberg commented Nov 8, 2018

Yes that would solve that issue, but is not really nice to look at. We could disallow the use of a variable inside sizeof, so you could only use sizeof(Type). I think I would prefer that to the type keyword...

@bvdberg bvdberg added the bug label Nov 10, 2018
@lerno
Copy link
Collaborator Author

lerno commented Nov 18, 2018

I opened #69 to decide whether this should be trivially solved by the naming rules. It depends on whether you want to backpedal on naming rules later on, but even then it should be fairly straightforward to put in an analyser.

@lerno
Copy link
Collaborator Author

lerno commented Nov 18, 2018

Also, I think that sizeof might want to return the size of any expression.

So:

i32 x = sizeof(a = 12);
i32 y = sizeof(a.call().foo());

Should all be fine and inlined directly. I did some exploratory coding and it seemed to resolve itself in a fairly straightforward way.

Something that has been nagging me is whether types should be first class entities. That is, should you be able to pass around a type and get information about it during runtime. If not, what about compile time?

@lerno
Copy link
Collaborator Author

lerno commented Nov 27, 2018

Apparently the above works fine in C, so I'll implement it that way.

@lerno
Copy link
Collaborator Author

lerno commented Nov 30, 2018

Ok, found another ambiguity, which is very bad:

sizeof(Foo.a)

Here it is not clear if we mean the size of the field a in foo, or if we refer to the function Foo.a.

All of these "type or expression" are going to be rather hard to parse in this manner since they are deeply ambiguous.

(And people will curse us badly when they try to parse it for an IDE or some other reason).

There are a couple of solutions that spring to mind:

  1. Defer parsing this at all until types are known. This is a bit messy through, as parsing needs to be done two ways. For C it would have been possible to speculatively parse it as both an expression and a type. Unfortunately C2 adds this EXTRA level of ambiguity from functions. So even if you know that Foo is a type, you cannot fully parse it!
  2. Create a special keyword for parsing type sizes, like typesize. (I love this idea because it simplifies parsing and analysis)
  3. Do a simple "overview" parse (that only sees global structs, file definitions and globals) and then do a second pass that does parsing and analysis in a single sweep. This would change things quite a bit, but with the advantage of simplifying the code quite a bit. (The current passes are really difficult to follow)

@lerno
Copy link
Collaborator Author

lerno commented Nov 30, 2018

From what I understand Clang has a concept of "unevaluated expressions". Evaluating sizeof basically pushes the tokens there into a blob to be parsed later. Sort of (1). This seems to add quite a bit of complexity and added functionality:

  1. You need to store that temporary data.
  2. You then need to parse it in the analysis step

Because C2 has func and type, this ambiguity is resolved for functions and declarations. In the spirit of those changes it would make sense that C2 tries to retain an unambiguous grammar even for sizeof. My proposal is therefore a solution along the lines of (2). I honestly hate the name I suggested ("typesize"). Alternatives would be use typeof for expressions. So

sizeof(Foo);
sizeof(typeof(a));

There is one thing that's not quite resolved though. Consider

type Foo struct {
  i32 a;
  i32 b;
}

// is it:
sizeof(Foo.a)
// or
sizeof(typeof(Foo.a))

@lerno
Copy link
Collaborator Author

lerno commented Nov 30, 2018

Digging a bit further I now see that in C sizeof(Foo.a) is not supported. So maybe we can solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants