-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserving trivia #11
Comments
This parser is used in my web asset bundler and probably Jint will adopt it sooner or later too. Currently I consider these the main consumers. These don't need the extra info, that is, they don't really want to pay the performance penalty coming with it. So yep, the situation hasn't been changed. :) However, it would be pretty cool to support the use cases of refactoring tools too as that would be a pretty unique feature (most of the JS parsers just implements plain ESTree, which doesn't include the necessary info). So I'm not against it - if we can come up with a solution that doesn't hurt the main use cases. Without taking the perf. penalty, I can think of one viable approach: conditional compilation. We'd need two separate builds, i.e. two flavors of the parser + AST. Something like this: [VisitableNode(ChildProperties = new[] { nameof(Id), nameof(Params), nameof(Body)})]
public sealed partial class FunctionDeclaration : Declaration, IFunction
{
private readonly NodeList<Node> _params;
#if EXTENDED_AST
public FunctionDeclaration(
Token? asyncKeyword,
Token functionKeyword,
Token? generatorStarToken,
Identifier? id,
Token paramsOpenParenToken,
in NodeList<Node> parameters,
Token paramsCloseParenToken,
FunctionBody body)
: base(NodeType.FunctionDeclaration)
{
AsyncKeyword = asyncKeyword;
FunctionKeyword = functionKeyword;
GeneratorStarToken = generatorStarToken;
Id = id;
ParamsOpenParenToken = paramsOpenParenToken;
_params = parameters;
ParamsCloseParenToken = paramsCloseParenToken;
Body = body;
}
public Token? AsyncKeyword { get; }
public Token FunctionKeyword { get; }
public Token? GeneratorStarToken { get; }
public Token ParamsOpenParenToken { get; }
public Token ParamsCloseParenToken { get; }
public bool Generator => GeneratorStarToken is not null;
public bool Async => AsyncKeyword is not null;
#else
public FunctionDeclaration(
Identifier? id,
in NodeList<Node> parameters,
FunctionBody body,
bool generator,
bool async)
: base(NodeType.FunctionDeclaration)
{
Id = id;
_params = parameters;
Body = body;
Generator = generator;
Async = async;
}
public bool Generator { [MethodImpl(MethodImplOptions.AggressiveInlining)] get; }
public bool Async { [MethodImpl(MethodImplOptions.AggressiveInlining)] get; }
#endif
/// <remarks>
/// Diverging from the ESTree specification, <see langword="null"/> is used to indicate an anonymous default exported function (instead of introducing <see langword="AnonymousDefaultExportedFunctionDeclaration"/>).
/// </remarks>
public Identifier? Id { [MethodImpl(MethodImplOptions.AggressiveInlining)] get; }
/// <remarks>
/// { <see cref="Identifier"/> | <see cref="ArrayPattern"/> | <see cref="ObjectPattern"/> | <see cref="AssignmentPattern"/> | <see cref="RestElement"/> }
/// </remarks>
public ref readonly NodeList<Node> Params { [MethodImpl(MethodImplOptions.AggressiveInlining)] get => ref _params; }
public FunctionBody Body { [MethodImpl(MethodImplOptions.AggressiveInlining)] get; }
StatementOrExpression IFunction.Body => Body;
bool IFunction.Expression => false;
private FunctionDeclaration Rewrite(Identifier? id, in NodeList<Node> @params, FunctionBody body)
{
return new FunctionDeclaration(id, @params, body, Generator, Async);
}
} Obviously, conditional parts need to be added to the parser as well (plus the source generator need to be adjusted to conditional compilation), which can become pretty ugly, but in theory this is all doable. I, however, have no incentive to implement this at the moment. But always welcome contributions. ;) |
I see. I'm quite sure it's technically possible tho without much performance penalty as long as it can be opted-in. E.g. the way Roslyn built their data, which is very fast. But I see the argument of no incentives to do it. I can see if I can suggest something later if I start to use it. |
It's definitely possible, the key question is how big that performance penalty will be. Because of consumers like Jint it must be close to zero. Which means not only execution time but also the memory consumption of the parser and the memory footprint of the resulting AST. I can see the following possible ways:
To sum it up, I'd suggest checking the viability of Solution 3 first. It would require some post-parsing computation but it would have a big advantage: there would be no need to alter the AST and the code of the parser. What I'm not sure about is that it's doable for all the possible syntax constructs. Determining that would need some research. If this approach is not viable or you don't like it, I'd still discourage chasing Solution 2. Chances are that it would lead to a dead end. In that case the only viable way I can see is Solution 1. Unless, of course, you can come up with another one that meets the criteria laid down above. Taking a look at Roslyn would be definitely useful, it might give us ideas I can't see now. Anyway, if you decide to take a stab at this nice problem, just let me know if you need any guidance. |
Also just adding a vote here. For me, I really wish that comments were a part of the tree. I have a use-case where I'm taking minized code and prettifying it. But in the process of going AST-to-string, the prettified code loses the comments. I did look into it back in 2022, but was not convinced that this was gonna be easy to solve: sebastienros/esprima-dotnet#39 At some point, I'll probably be desperate enough to look into this more in-depth. I think when considering it, what sticks out for me is:
|
Hey @CharlieEriksen, For sure, this will be a tough nut to crack, no matter which path we go down. This is my take on your points:
|
I had a bit of an interesting thought: How does It's interesting to notice that prettier actually has the option of using acorn as its parser. And for my specific use-case, I'm quite OK with the output format being opinionated. So once I have more time to look into how prettier does things, I have a feeling that it might be a good starting point to investigate from. |
About this, to avoid introducing extra processing, one could have a separate data-store for storing trivia information. Something like (not exactly, but to give an idea): public interface ITriviaStore
{
void SetTriviaBefore(AstNode node, ReadOnlySpan<char> trivia);
void SetTriviaAfter(AstNode node, ReadOnlySpan<char> trivia);
} This way, no extra data is added to the nodes, but you could have a interface that is 'null' when you don't need this data, and the parser behaves exactly like before. And you could have a predefined implementation that stores this data in a map if you need it. A code writer could also take an optional 'ITriviaStore' and use it to fetch the trivia in order to preserve it. It's a bit more expensive when you need trivia data than storing them directly in the nodes, but the extra cost is next to nothing when you don't need it. |
I agree that we'll need some kind of abstraction for this feature to encapsulate and hide the actual data structure that stores the required data. My comment, however, wasn't about the abstraction but about storage. Somehow we need to associate the AST nodes with the extra data. I can't see any other options than these:
The only option of these that doesn't require a heap allocation per AST node is dictionary because dictionary value can be a value type as well. Nowadays you can access dictionary values without copying. So, from a performance perspective About the abstraction: As discussed above, to compute whitespace trivia, we'll certainly need the token and comment data besides the AST nodes. The parser provides these on demand, via the So, I can think of some extension methods in the Extras package like this: public static class ParserExtensions
{
public static FullSyntaxTree<Script> ParseScriptFull(this IParser parser, string input, string? sourceFile = null, bool strict = false);
public static FullSyntaxTree<Module> ParseModuleFull(this IParser parser, string input, string? sourceFile = null);
/* etc. */
}
public abstract class FullSyntaxTree
{
public Node Root => GetRoot();
protected abstract Node GetRoot();
// We'd store these as arrays internally so we can return readonly references to the items.
public IReadOnlyList<Token> Tokens { get; }
public IReadOnlyList<Comment> Comments { get; }
}
public sealed class FullSyntaxTree<TRoot> : FullSyntaxTree where TRoot : Node
{
public new TRoot Root { get; }
protected override Node GetRoot() => Root;
} Now, this class can implement something like Alternatively, I can also imagine extension methods for AST nodes as well. For example: public static class AstExtensions
{
public static TriviaList LeadingTrivia(this Node node, FullSyntaxTree syntaxTree);
public static TriviaList TrailingTrivia(this Node node, FullSyntaxTree syntaxTree);
public static ref readonly Token AsyncKeyword(this FunctionDeclaration node, FullSyntaxTree syntaxTree);
} This would allow us to write code like this: var asyncKeyword = functionDeclaration.AsyncKeyword(syntaxTree);
var trivia = functionDeclaration.LeadingTrivia(syntaxTree); What's your opinion on this? (Just to be clear, I'm talking about option 3: "another hypothetical run-time solution" mentioned in this comment of mine, which I'm still not sure is possible to do in practice.) |
Now I'm getting creative: public abstract class FullSyntaxTree
{
/* ... */
public FunctionDeclarationTokens TokensOf(FunctionDeclaration node) => new FunctionDeclarationTokens(node);
public TriviaList LeadingTriviaOf(in Token token);
public TriviaList TrailingTriviaOf(in Token token);
}
public readonly struct FunctionDeclarationTokens
{
private readonly FullSyntaxTree _syntaxTree;
public FunctionDeclarationTokens(FullSyntaxTree syntaxTree) => _syntaxTree = syntaxTree;
public ref readonly Token AsyncKeyword { get; }
public ref readonly Token FunctionKeyword { get; }
public ref readonly Token GeneratorStarToken { get; }
public ref readonly Token ParamsOpenParenToken { get; }
public ref readonly Token ParamsCloseParenToken { get; }
public ref readonly Token ParamSeparatorCommaToken(int index);
/* etc. */
} which allows ref readonly var asyncKeyword = ref syntaxTree.TokensOf(functionDeclaration).AsyncKeyword;
var trivia = syntaxTree.LeadingTriviaOf(asyncKeyword); In the meantime I realized that it doesn't make much sense to query trivia info for AST nodes. It does for specific tokens (just think about cases like Another remark: by returing ref readonly we can't really communicate whether the token is optional (plus the consumer would need to do ugly We should also keep an eye on the upcoming extension everything C# feature. It might come in handy to reduce the boilerplate. But I don't think it's the API design that's going to be the hard part here. |
Original issue: sebastienros/esprima-dotnet#391
Now that we're in a fork, I just wanted to ask if this is still the status quo. ;)
Reason I'm asking is that it would be superuseful for refactoring tools where you'd want to rewrite an expression, but keeping the original trivia around it.
The text was updated successfully, but these errors were encountered: