Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental HiFi tree diff algorithm for use with quick-fixes and refactoring commands in the IDE #2031

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

jurgenvinju
Copy link
Member

@jurgenvinju jurgenvinju commented Sep 12, 2024

This transforms a pair of parse Trees <original, rewritten> to a list[TextEdit]s. The resulting
TextEdits are ready for use in VScode extensions via LSP features in util::LanguageServer and
util::IDEServices.

The single pass parse tree recursion maps sub-tree and sub-list differences to textual differences in a special way.
It lifts on the semantics of special parse tree non-terminals (literals, lexicals, separators) to ignore certain
superfluous changes made in the rewritten tree. As a result the edited source text retains more of
its original layout, including indentation and comments, as compared to yielding the
rewritten parse tree to a string and replacing the entire file.

Especially with separated lists this algorithm does amazing work. Suppose you have a target pattern:
<Element e>, <{Element ","}* newElems, <Element f> and newElems happens to be empty, then:

  1. Rascal will remove the superfluous separators and layout around the empty list.
  2. HifiTreeDiff will try and keep indentation and source code comments after the e and before the f even
    though these have been replaced by the concrete target pattern.
  3. If the list is non-empty, also the layout before the new sublist and after the new sublist will be taken from the
    original list where possible.

The smaller diffs are not only good for high-fidelity in general, but also in particular smaller diffs are essential for providing interactive preview and undo features in the IDE. This PR enables language engineers to use parse tree rewriting rather than collecting the text edits themselves.

TODO's:

  • write test, and fix initial issues triggered by the tests
  • test the example in the documentation
  • shorten the documentation
  • document the TextEdits module
  • short-circuit lexical identifiers (do not go deeper)
  • write indentation inheritance algorithm

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 49%. Comparing base (e0bfa4d) to head (ec718d1).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/org/rascalmpl/types/NonTerminalType.java 50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              main   #2031   +/-   ##
=======================================
  Coverage       49%     49%           
- Complexity    6237    6259   +22     
=======================================
  Files          663     663           
  Lines        59211   59216    +5     
  Branches      8629    8631    +2     
=======================================
+ Hits         29508   29584   +76     
+ Misses       27473   27387   -86     
- Partials      2230    2245   +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jurgenvinju
Copy link
Member Author

@tvdstorm @DavyLandman I don't have the energy to finish this now, so I'm parking it here until I do. I wanted you to know it exists, because low-fidelity rewrites are a common issue we have to solve and because refactoring and quick-fixes in VScode are now under our fingertips.

@DavyLandman
Copy link
Member

Cool stuff 👍

This finishes the complete algorithm for lists for the first time. The algorithm works in these steps:
* Trim equal elements from the head and the tail of both lists
* Detect common edits to lists with fast list patterns; this is an optional optimization
* Find the latest common sublist and split both lists in three parts: two different prefixes, two equal middle parts and two different post fixes. Recurse on the prefixes and the postfixes and concatenate their edits lists.
* Finally we end up with two empty lists or two lists without common elements; we collect the differences of each element position pairwise. Lists that became shorter get an additional edit to cut off the list, while lists that became shorter get one additional edit to add the new elements. The new elements inherit indentation from the pre-existing elements. 

For these changes additional tests still must be added later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants