-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
A type safe, generic interface for AstNode 'user' data #4646
base: master
Are you sure you want to change the base?
Conversation
This is a briefer demo of how the templates work: https://godbolt.org/z/vfE8e7W3E |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the motivation, especially the type safety, but I'm not sure....
How do you handle cases where one visitor e.g. marks the user1, and another consumes it? They typically do share a base class but not sure that's universally true.
if (blocking) m_user1(nodep) = true;
This seems confusing to new readers that it's changing something inside nodep, but I don't have an alternative. I suppose we need to read this as if the user is a container class.
You pass a reference to the
There is a section about the user data fields in the internals docs. Add " Is there a syntax that you would be happy with? For the 'larger than pointer' case you need to return references. Doing so is also common in the STL, (std::vector::back(), std::map::operator[], etc). The idiomatic way to assign a tuple element is It could be
FWIW I read this as functional code: |
What do you think about decoupling the separate usages? While they share user1, with the new scheme it seems you've decoupled them in a way that will still check for mis-overlaps.
|
The single handle version would do the overlapping usage check at compile time via a static_assert (it doesn't today, but not hard). Having them separate you could only check at run-time via global variables like we do for the generation number. Having them separate means you still need a separate VNUser1InUse, on top of all else. You also risk sprinkling some of them around various visitors, which makes it hard to figure out what might conflict, while with the single definition it's all right there in one place. Which all negate some of the reasons why I proposed this so at that point I'm not sure it's much of an improvement (type safety yes, consistency less so). Fundamentally, because the lifetime and storage of all user data fields are tied together, I think it's best to have them defied as a unit. If your displeasure is due to the use of the overloaded
it would be
so you would write You could also enforce that the data is always a |
If we use a macro to define these rather than a template there's lots of alternatives. We could use defines to check (e.g. create a new macro based on the user number and node type and child types (based on macros made by astgen). Or we could have astgen check directly. Or we could do it at runtime using statics at startup created by the macros. There's probably some static_assert trick too but I haven't mastered that tool yet ;). Part of what I like about this is being able to name the users like normal variables based on what they do e.g. (m_user1Processed), which I think is better self-documentation wise than "m_user1(nodep)" where you need to read comments to figure out how they are used (as we have to do now, and still do in your first proposal). |
Having them separate makes it hard to pass them through to sub visitors, which we would need to do with this scheme (and for that we would also need to be able to have a You could of course add/generate a one liner It's ok if you don't think this is sufficient improvement on its own, we can leave it out, I proposed this because this would help moving towards removing one more userp, as they are a memory sink and we don't really need so many with better management. |
I do like the idea of improvement here. Let me play a bit, after I get V3Dead improved ;) |
I'd be curious how this experiment would perform to move user out of nodes entirely; this would allow arbitrary number of user's unique to each visitor:
If nodes are created, we'll need to pay the penalty to move the vector, however steady state should soon be reached. This adds an indirection to access each user(), but that's offset by nodes being smaller, and also by potentially less cache write pressure (as all the changing user()'s are packed together, rather than in otherwise probably not changing large nodes. (If user() is just tracking a bool we can even use a bit-packed array and fit 64*8 node's flags per cache line.) |
Hmm, maybe I'm missing something, but actually I think this does not hold:
If I'm understanding your proposal correctly, to access the user data, you would do: The other problem I think is sparsity. If I want to associate 32 bytes, but only with every 3rd variable (e.g.: only those that are assigned via NBA), I still need to alloc 32 bytes for each node (or at least an 8-byte pointer if I go one more indirect), as the unique IDs are, well, unique, and I am using a Vector. For sparsity you would need a map, but maps are the source of all performance problems. We also have the problem of the peak unique ID potentially dominating, e.g.: when in V3Inline, you clone a whole module, you might double the number of IDs. Even if you delete the original, if anything from the clone survives, it's high ID will determine the memory consumption of all your user vectors. I do think there is a way to have just one user pointer with my proposed scheme, with some extensions. However, because we do have some independent algorithms that rely on having their own field (V3Hash/V3Const)), this might not be feasibly, though we can make even that work by making those algorithms templates and passing in something like the |
Your points are correct, but need to be compared in context against what we have now. Compared to that, sparsity/locality is better (if the node isn't otherwise being changed) then current scheme. Most usages are small, for large, e.g. your 32 byte example we probably want to stay with dynamic structures we have now. I'm not worried about peak ID, because the peak node count already determines max memory, using less later when less nodes are around is unlikely to matter much. If it does, we could always have e.g. V3Broken "compress" the ID space to only used nodes. Anyhow, depends highly on ratio of nodes being edited for other reasons, and how many users are written as percentage of tree, etc, beyond my intuition so would need an experiment. |
I'm starting to loose track of what we are trying to achieve. My original goal with this was to improve type safety so we can refactor the user* code safely with the assistance of static type checking from the compiler. The new interface also IMO happen to improve on some other properties as listed in the top comment while being idiomatic C++ ( |
Ok, I guess table for now. I was bringing this up because if we have to refactor the various visitors I'd like to do it all at once, versus having another better performance scheme to move to later. |
I'm wondering if instead of a vector, using a high-radix trie indexed by the unique ID would be workable. This allows for sparsity without reusing/compressing IDs, which means each node could have a truly unique ID, which can also be shown in dumps instead of pointers (#4715). We could even make the node type a prefix of the ID to aid locality/reduce memory further. |
I hope you like templates 馃榿.
This is a proposal for a new, type safe, generic, and efficient interface for accessing the
AstNode
user fields. I would like to propose for this to be the universal mechanism for accessing the user data in the nodes.Improvements over the current APIs:
AstNode::user*
methods, and theAstNodeUser*Allocator
classes.VNUser*Handle
, and the associated data is strongly typed, eliminating the need for casts in the user code.Otherwise the efficiency is the same or better as the current APIs. Anything less or equal to a pointer is stored directly (as if using AstNode::user*), and anything larger than a pointer is automatically allocated in a
std::deque
by value and is held indirectly.FWIW I believe this is all just C++11.
We can change names of things.
I have only applied to user1 in V3Delayed as a demonstration to get feedback (and discovered some out of date comments and unnecessary type ascriptions in the process), but I'm fairly sure all usage of the AstNode::user* methods could be replace with this new mechanism, if you are happy with it. I probably won't have time to do them all in one go though.
TODO: