Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export format discussion #1

Open
ammkrn opened this issue Sep 29, 2023 · 13 comments
Open

export format discussion #1

ammkrn opened this issue Sep 29, 2023 · 13 comments

Comments

@ammkrn
Copy link

ammkrn commented Sep 29, 2023

I wrote/drew up a description of the main thing we talked about in a very bootleg PDF: lifetimes.pdf

A few additional thoughts:

  1. The declaration items currently give their universe parameters as name indices, which the type checker needs to promote to levels. It seems like it would be easier to do what's already done with #EC and give a list of level indices instead of name indices, and the type checkers could get rid of a (very) little bit of code in the parser.

  2. Something to think about down the line if there are multiple type checkers, it may be nice to have an extension to the export format that conveys the pretty printer options the user wants to use and a list of the declarations they actually want to see printed back to them. Users would just convey that once to the exporter and be able to expect the same behavior from each checker, and it might prevent different checkers from having slightly different ways of taking pretty printer options.

  3. It may be worth it to have some kind of lean version identifier somewhere. While the underlying theory isn't really expected to change, there might be small implementation details in checkers or the format that warrants versioning (we're on v2 of the export format after all).

@Kha
Copy link
Member

Kha commented Sep 29, 2023

I wrote/drew up a description of the main thing we talked about in a very bootleg PDF: lifetimes.pdf

LGTM

  1. The declaration items currently give their universe parameters as name indices, which the type checker needs to promote to levels. It seems like it would be easier to do what's already done with #EC and give a list of level indices instead of name indices, and the type checkers could get rid of a (very) little bit of code in the parser.

At first glance this does not sound consistent to me: we use level objects when using universes but names when declaring them (as it wouldn't make sense to have Level.zero as a universe parameter). This is how it works internally everywhere in Lean as well.

2. Something to think about down the line if there are multiple type checkers, it may be nice to have an extension to the export format that conveys the pretty printer options the user wants to use and a list of the declarations they actually want to see printed back to them. Users would just convey that once to the exporter and be able to expect the same behavior from each checker, and it might prevent different checkers from having slightly different ways of taking pretty printer options.

Makes sense, though a standardized command line format should achieve the same, no?

3. It may be worth it to have some kind of lean version identifier somewhere. While the underlying theory isn't really expected to change, there might be small implementation details in checkers or the format that warrants versioning (we're on v2 of the export format after all).

Perhaps an export format identifier instead, then?

@ammkrn
Copy link
Author

ammkrn commented Sep 30, 2023

At first glance this does not sound consistent to me: we use level objects when using universes but names when declaring them (as it wouldn't make sense to have Level.zero as a universe parameter). This is how it works internally everywhere in Lean as well.

I forgot this is one of the areas where the type checkers differ; trepplein and nanoda eagerly promote declaration uparams to levels (so the declaration headers are (Name, Level*, Type)), but the C++ and Haskell versions keep them as names. I'll probably switch to using names to keep things consistent.

Makes sense, though a standardized command line format should achieve the same, no?

Sure, I'm fine with doing that instead.

ammkrn added a commit to ammkrn/lean4export that referenced this issue Oct 18, 2023
@ammkrn
Copy link
Author

ammkrn commented Oct 18, 2023

Just a request for initial feedback, let me know what you think about the changes in 4e9bf08.

  • The recursor rules, constructors, recursors, and reducibility hints (for height) were added to the output.
  • THM is a separate thing now, since it doesn't have a hint.
  • QUOT has been changed to output each declaration separately with the accompanying name, type, and uparams.
  • IND now only has the constructor names in-line since the constructor types are output separately and can be found by name.

@Kha
Copy link
Member

Kha commented Oct 19, 2023

Makes sense to me. /cc @semorrison @digama0?

@digama0
Copy link
Collaborator

digama0 commented Oct 19, 2023

SGTM as well. I agree that the checker should be doing more checking and less generating of expressions.

@ammkrn
Copy link
Author

ammkrn commented Oct 20, 2023

@Kha Is there a quick way to tell whether an Expr contains an mdata node? The current implementation of getIdx and dumpExpr outputs duplicate and unreachable items because App f a and App f (mdata a) have the same export representation, but different hash digests/aren't BEq. The direct solution of recursively removing the mdata works fine, but it seems to increase the execution time by ~2x even with a cache.

There are relatively few mdata nodes in the entire environment, so if there's a way to quickly determine whether or not an Expr has any mdata nodes and only rebuild the ones that do, I think that would solve the problem pretty nicely.

@digama0
Copy link
Collaborator

digama0 commented Oct 20, 2023

I think the best way to fix this is to just add a mdata node to the export format. For the purpose of hash/eq, it's fine if the node doesn't even say anything, it's just mdata ? e without any information about the content of the mdata. On the other hand, this information is useful for various things, so I think it would be okay to just serialize it too.

@ammkrn
Copy link
Author

ammkrn commented Oct 23, 2023

With a slightly more clever implementation I got the mdata-less version of dumpExpr to roughly the same performance on Init.lean, and ~30% slower on mathlib4. I'm proposing putting it behind a flag, it's an extra 50 sloc that's mostly let bindings. I like the option of leaving the mdata nodes in, but after working an mdata node into the wip rust checker, it can feel out of place in something that's supposed to be minimal.

@ammkrn
Copy link
Author

ammkrn commented Jan 1, 2024

@Kha

I'm getting ready to release the updated version of nanoda_lib and the binary, and I wrote a small book about the kernel/type checking. I pushed a commit for this repo here as a candidate for the exporter changes that were discussed. Let me know if you have comments/concerns. It includes the mdata pruning and addition of a version string for the export format.

I'll send you a draft of the book in the next day or two to get your thoughts on the initial release.

@ammkrn
Copy link
Author

ammkrn commented Jan 1, 2024

Also to Sebastian and @digama0, I put the introductory section of the book type checker guide in a gist. If you have a minute, I would appreciate any feedback as experts.

It includes a slightly edited quote from Mario on zulip, if you're opposed to its inclusion or want to change the wording, please let me know.

@digama0
Copy link
Collaborator

digama0 commented Jan 1, 2024

LGTM

@Kha
Copy link
Member

Kha commented Jan 1, 2024

Looks nice! You might want to reference https://doi.org/10.1016/j.entcs.2012.06.008 in the Adversarial section

@ammkrn
Copy link
Author

ammkrn commented Jan 2, 2024

Looks nice! You might want to reference https://doi.org/10.1016/j.entcs.2012.06.008 in the Adversarial section

That's a wonderful reference, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants