Some questions about the structure of rules #85

SerafimArts · 2018-07-06T17:14:43Z

What is the difference between $nodeId and $defaultId in grammar rules?

(new Concatenation($id, $children, $nodeId))->setDefaultId($defaultId);
//                                 ^^^^^^^ - There ------- ^^^^^^^^^^

Why do I need $nodeId in tokens?

new Token($id, $name, $nodeId, $unificationId, $kept);
//                    ^^^^^^^ - excess?

Why determine that the rule is transition, if the identifier can uniquely point to it?

Those. we can have completely numeric indexes in the rules array, and whether or not the rule in the AST will be determined by the presence of the name in the "Entry" element of the trace. That some order should speed up the initialization and fetching (optimizing arrays for numeric indexes php7+)

The text was updated successfully, but these errors were encountered:

Hywan · 2018-07-09T07:48:52Z

Hi :-),

Thanks for the relevant questions.

1. What is the difference between `$nodeId` and `$defaultId` in grammar rules?

When you write a rule, you can define a node ID like this:

rule:
    call() ::token:: #nodeId

But most of the time, the node ID is the name of the rule:

foo:
    call() ::token:: #foo

And so it can be reduced to:

#foo:
    call() ::token::

In this case, #foo is the default ID. Why default? Because it can be overwritten:

#foo:
    ::tokenA:: | ::tokenB:: #bar

When tokenA is matched, the node ID is the default node ID (namely #foo), but when tokenB is matched, the node ID is #bar.

In this case, #foo is the default ID, and #bar is the node ID for the second branch of the condition in the rule foo.

2. Why do I need `$nodeId` in tokens?

That's a good question. I read the code, and right now, I found no useful case where the $nodeId is used in the constructor. It is always set to null. I guess we can remove it!

3. Why determine that the rule is transition, if the identifier can uniquely point to it?

I don't fully understand the question. The data structure is a flat representation of the grammar. New rules are created, for instance:

rule:
    <a> ( <b> | <c> )

is transformed into (pseudo code):

rule: concat(0, 1)
0: token(a)
1: choice(2, 3)
2: token(b)
3: token(c)

There is only 4 operators:

Choice(...children),
Concatenation(...children)
Repetition(min, max, ...child),
Token.

Your suggestion is to rename indexes for the flat grammar representation to be only numerical, so that PHP 7 could optimise it? Is that right?

SerafimArts · 2018-07-09T12:21:17Z

1.

When tokenA is matched, the node ID is the default node ID (namely #foo), but when tokenB is matched, the node ID is #bar.

Hmm, I did not think about this situation, thank you!

2.

That's a good question. I read the code, and right now, I found no useful case where the $nodeId is used in the constructor. It is always set to null. I guess we can remove it!

Should I create a separate issue for this task?

3.

I don't fully understand the question. The data structure is a flat representation of the grammar. New rules are created, for instance:

I meant that in the end it is not very important whether the rule is a group or a separate rule:

example:
    ::token:: (::token:: rule())*
//                ^ transitional repetition

// OR

example:
    ::token:: relation()*
//                ^ non-transitional repetition

relation:
  ::token:: rule()

Therefore, information about the fact that the rule isTransitional is also not needed.

Your suggestion is to rename indexes for the flat grammar representation to be only numerical, so that PHP 7 could optimise it? Is that right?

Rule names do not matter what they are, strings or numbers, then all these places can be made numeric, which should quite speed up this case and reduce memory consumption. Right.

However, there is one problem, trace items (Entry and Ekzit) are directed precisely by the names of the rules: $this->transtional = \is_int($name);.

To this I do not think that the implementation:

Integer indexes + Integer names + add boolean $isTransitional

Will be faster and less in memory consumption, rather than:

String and Int indexes + String and Int names

Hywan · 2018-08-15T07:22:14Z

Should I create a separate issue for this task?

Please yes! I'm cleaning up the hoa/compiler so it's a good opportunity for that.

Therefore, information about the fact that the rule isTransitional is also not needed.

It is needed for tools that manipulate the grammar, so when analyzing language. This is not useful at runtime though.

Rule names do not matter what they are, strings or numbers, then all these places can be made numeric, which should quite speed up this case and reduce memory consumption. Right.

It feels true. The original name can be kept in an attribute (and I think it is already the case). So +1 for this!

Are you willing to try a PR for that too?

SerafimArts · 2018-08-15T12:23:01Z

Please yes! I'm cleaning up the hoa/compiler so it's a good opportunity for that.

#86

It is needed for tools that manipulate the grammar, so when analyzing language. This is not useful at runtime though.

Judging by the construction algorithm - it is still required to build a correct AST. I figured it out, so I'm taking off the question.

Are you willing to try a PR for that too?

I can not implement this part. Attempts to get rid of identifiers (replacing them with numeric indexes) resulted in my fork to multiple bugs. While I think about it =\

flip111 · 2019-01-25T14:47:18Z

@SerafimArts do you have any remaining open questions, or has everything been answered now? Please close the ticket.

Hywan self-assigned this Jul 9, 2018

Hywan added question in progress labels Jul 9, 2018

flip111 mentioned this issue Jan 25, 2019

Remove $nodeId from Token rule definition. #86

Open

SerafimArts closed this as completed Jan 25, 2019

ghost removed the in progress label Jan 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about the structure of rules #85

Some questions about the structure of rules #85

SerafimArts commented Jul 6, 2018 •

edited

Loading

Hywan commented Jul 9, 2018

SerafimArts commented Jul 9, 2018 •

edited

Loading

Hywan commented Aug 15, 2018

SerafimArts commented Aug 15, 2018

flip111 commented Jan 25, 2019

Some questions about the structure of rules #85

Some questions about the structure of rules #85

Comments

SerafimArts commented Jul 6, 2018 • edited Loading

Hywan commented Jul 9, 2018

1. What is the difference between $nodeId and $defaultId in grammar rules?

2. Why do I need $nodeId in tokens?

3. Why determine that the rule is transition, if the identifier can uniquely point to it?

SerafimArts commented Jul 9, 2018 • edited Loading

1.

2.

3.

Hywan commented Aug 15, 2018

SerafimArts commented Aug 15, 2018

flip111 commented Jan 25, 2019

SerafimArts commented Jul 6, 2018 •

edited

Loading

1. What is the difference between `$nodeId` and `$defaultId` in grammar rules?

2. Why do I need `$nodeId` in tokens?

SerafimArts commented Jul 9, 2018 •

edited

Loading