Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add in-memory TextDocument implementation #1882

Open
josharian opened this issue Sep 10, 2023 · 6 comments
Open

add in-memory TextDocument implementation #1882

josharian opened this issue Sep 10, 2023 · 6 comments

Comments

@josharian
Copy link
Collaborator

Moving the conversation from #1815, where it is buried in a comment thread: #1815 (comment)

For moving to an LSP, and for faster/easier tests not requiring the full VSCode API, it would be helpful to have an in-memory implementation of TextDocument.

#1815 adds a simple one, coded from scratch. But @pokey points out that https://github.com/microsoft/vscode-languageserver-node already has one.

Unfortunately, the TextEditor interface as used by cursorless internally (call this CTE, or Cursorless TextEditor) is larger than the TextEditor interface implemented by https://github.com/microsoft/vscode-languageserver-node (call this VSCTE, for VSCode TextEditor).

Attempts to use VSCTE hit a wall, because missing properties like lineAt and range are extensively used in CTE and are difficult to implement efficiently using VSCTE. Rewriting all of cursorless to use VSCTE directly would (I think) result in less readable, less efficient code.

Attempts to wrap VSCTE in a new object implementing CTE are also not promising: CTE's lineAt returns a TextLine, which VSCTE doesn't have. By the time you set up all the glue, there aren't many savings over just using a mock CTE directly. (VSCTE has some nice performance properties, like lazy line resolution, but IIUC those get destroyed the instant you need to wrap it to make a CTE.)

The question is: As a practical matter, how do I proceed from here? Input requested, although not urgently. (I can keep hacking atop #1815 with the existing implementation for the time being.)

cc @pokey

Related issues:

@josharian josharian added the to discuss Plan to discuss at meet-up label Sep 10, 2023
@AndreasArvidsson
Copy link
Member

Our current Cursorless text document is just a wrapper around the vscode text document. Couldn't we reuse that?

https://github.com/cursorless-dev/cursorless/blob/d29f6c4b9b3d1a128e9309b8dd1b4885ab630a62/packages/cursorless-vscode/src/ide/vscode/VscodeTextDocumentImpl.ts

@pokey
Copy link
Member

pokey commented Sep 14, 2023

There's a fair amount of code involved in the VSCode text document class. I went spelunking through it at one point and it's pretty gnarly and split across several files. Might work tho 🤷‍♂️

Fwiw @josharian that's what we did for our snippet parser: we just vendored in the VSCode snippet parser

@josharian
Copy link
Collaborator Author

I'll give vendoring that a try, see how self-contained it is.

@josharian
Copy link
Collaborator Author

OK, I copy/pasted and hacked away at their impl. The result is #1899. I have run that through all the tests that I wrote from my own implementation and gotten them all to pass.

It required very substantive editing, including writing lastNonWhitespaceCharacterIndex de novo. The dependency tree is so significant I do not see a way to automate vendoring it.

Having now looked at that implementation in reasonable detail and compared it to the one that I cobble together, the only significant improvement that I see is using a prefix sum to avoid linear calculation of line offsets. That is definitely an improvement, although that is actually an easy one to vendor for.

It's less nice to use; you have to bring your own array of lines already chopped up neatly, which is a pile of code to implement:

    const rawLines: string[] = s.match(/[^\n]*\n|[^\n]+/g) ?? [];
    rawLines.forEach((line, i) => {
      const eol = line.match(/(\r?\n)$/)?.[1] ?? "";
      if (eol.length > 0) {
        line = line.slice(0, -eol.length);
        rawLines[i] = line;
      }
    });

And it uses URIs instead of filenames. This is all little stuff that we could fix, at the cost of diverging even more from the vendored code.

In short, I do not see a path forward in which we simply copy/paste their code. Any which way, we are going to own this code. At that point it is not particularly material whether we use their implementation or mine. If it were entirely after me I would probably end up merging the two implementations into something that was both nice to use and performant.

@AndreasArvidsson
Copy link
Member

lastNonWhitespaceCharacterIndex is something we had to implement ourself but it was quite simple.

get lastNonWhitespaceCharacterIndex(): number {
return this.line.text.trimEnd().length;
}

@pokey pokey removed the to discuss Plan to discuss at meet-up label Oct 1, 2023
@pokey
Copy link
Member

pokey commented Oct 1, 2023

update from discussion:

  1. look into rust piece-tree impl, if it exists. might be at https://github.com/rust-lang/rust-analyzer
  2. look at vscode piece tree impl maybe
  3. quick read through of why VSCode switched to piece tree
  4. we're happy with something less than lightning fast for now to use for tests and initial lsp impl, as long as the interface doesn't lock us into something bad in the future as indicated by 1) and 2) above

At some point, we should run some performance tests on large files, but for now that's probably not necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants