add in-memory TextDocument implementation #1882

josharian · 2023-09-10T23:36:02Z

Moving the conversation from #1815, where it is buried in a comment thread: #1815 (comment)

For moving to an LSP, and for faster/easier tests not requiring the full VSCode API, it would be helpful to have an in-memory implementation of TextDocument.

#1815 adds a simple one, coded from scratch. But @pokey points out that https://github.com/microsoft/vscode-languageserver-node already has one.

Unfortunately, the TextEditor interface as used by cursorless internally (call this CTE, or Cursorless TextEditor) is larger than the TextEditor interface implemented by https://github.com/microsoft/vscode-languageserver-node (call this VSCTE, for VSCode TextEditor).

Attempts to use VSCTE hit a wall, because missing properties like lineAt and range are extensively used in CTE and are difficult to implement efficiently using VSCTE. Rewriting all of cursorless to use VSCTE directly would (I think) result in less readable, less efficient code.

Attempts to wrap VSCTE in a new object implementing CTE are also not promising: CTE's lineAt returns a TextLine, which VSCTE doesn't have. By the time you set up all the glue, there aren't many savings over just using a mock CTE directly. (VSCTE has some nice performance properties, like lazy line resolution, but IIUC those get destroyed the instant you need to wrap it to make a CTE.)

The question is: As a practical matter, how do I proceed from here? Input requested, although not urgently. (I can keep hacking atop #1815 with the existing implementation for the time being.)

cc @pokey

Related issues:

The text was updated successfully, but these errors were encountered:

AndreasArvidsson · 2023-09-14T07:15:42Z

Our current Cursorless text document is just a wrapper around the vscode text document. Couldn't we reuse that?

https://github.com/cursorless-dev/cursorless/blob/d29f6c4b9b3d1a128e9309b8dd1b4885ab630a62/packages/cursorless-vscode/src/ide/vscode/VscodeTextDocumentImpl.ts

pokey · 2023-09-14T07:52:51Z

There's a fair amount of code involved in the VSCode text document class. I went spelunking through it at one point and it's pretty gnarly and split across several files. Might work tho 🤷‍♂️

Fwiw @josharian that's what we did for our snippet parser: we just vendored in the VSCode snippet parser

josharian · 2023-09-14T13:23:46Z

I'll give vendoring that a try, see how self-contained it is.

josharian · 2023-09-15T23:12:08Z

OK, I copy/pasted and hacked away at their impl. The result is #1899. I have run that through all the tests that I wrote from my own implementation and gotten them all to pass.

It required very substantive editing, including writing lastNonWhitespaceCharacterIndex de novo. The dependency tree is so significant I do not see a way to automate vendoring it.

Having now looked at that implementation in reasonable detail and compared it to the one that I cobble together, the only significant improvement that I see is using a prefix sum to avoid linear calculation of line offsets. That is definitely an improvement, although that is actually an easy one to vendor for.

It's less nice to use; you have to bring your own array of lines already chopped up neatly, which is a pile of code to implement:

    const rawLines: string[] = s.match(/[^\n]*\n|[^\n]+/g) ?? [];
    rawLines.forEach((line, i) => {
      const eol = line.match(/(\r?\n)$/)?.[1] ?? "";
      if (eol.length > 0) {
        line = line.slice(0, -eol.length);
        rawLines[i] = line;
      }
    });

And it uses URIs instead of filenames. This is all little stuff that we could fix, at the cost of diverging even more from the vendored code.

In short, I do not see a path forward in which we simply copy/paste their code. Any which way, we are going to own this code. At that point it is not particularly material whether we use their implementation or mine. If it were entirely after me I would probably end up merging the two implementations into something that was both nice to use and performant.

AndreasArvidsson · 2023-09-16T05:54:54Z

lastNonWhitespaceCharacterIndex is something we had to implement ourself but it was quite simple.

cursorless/packages/cursorless-vscode/src/ide/vscode/VscodeTextLineImpl.ts

Lines 28 to 30 in b78d551

 get lastNonWhitespaceCharacterIndex(): number { 

 return this.line.text.trimEnd().length; 

 }

pokey · 2023-10-01T17:33:38Z

update from discussion:

look into rust piece-tree impl, if it exists. might be at https://github.com/rust-lang/rust-analyzer
look at vscode piece tree impl maybe
quick read through of why VSCode switched to piece tree
we're happy with something less than lightning fast for now to use for tests and initial lsp impl, as long as the interface doesn't lock us into something bad in the future as indicated by 1) and 2) above

At some point, we should run some performance tests on large files, but for now that's probably not necessary

josharian added the to discuss Plan to discuss at meet-up label Sep 10, 2023

josharian mentioned this issue Sep 11, 2023

start playing with hat tests #1815

Draft

pokey mentioned this issue Sep 15, 2023

vendor enough of vscode's TextDocument implementation #1899

Closed

3 tasks

pokey removed the to discuss Plan to discuss at meet-up label Oct 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add in-memory TextDocument implementation #1882

add in-memory TextDocument implementation #1882

josharian commented Sep 10, 2023

AndreasArvidsson commented Sep 14, 2023

pokey commented Sep 14, 2023

josharian commented Sep 14, 2023

josharian commented Sep 15, 2023

AndreasArvidsson commented Sep 16, 2023

pokey commented Oct 1, 2023 •

edited

add in-memory TextDocument implementation #1882

add in-memory TextDocument implementation #1882

Comments

josharian commented Sep 10, 2023

AndreasArvidsson commented Sep 14, 2023

pokey commented Sep 14, 2023

josharian commented Sep 14, 2023

josharian commented Sep 15, 2023

AndreasArvidsson commented Sep 16, 2023

pokey commented Oct 1, 2023 • edited

pokey commented Oct 1, 2023 •

edited