Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruff server: Support Jupyter Notebook (*.ipynb) files #11206

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

snowsignal
Copy link
Member

@snowsignal snowsignal commented Apr 29, 2024

Summary

Closes #10858.

ruff server now supports *.ipynb (aka Jupyter Notebook) files. Extensive internal changes have been made to facilitate this, which I've done some work to contextualize with documentation and an pre-review that highlights notable sections of the code.

*.ipynb cells should behave similarly to *.py documents, with one major exception. The format command ruff.applyFormat will only apply to the currently selected notebook cell - if you want to format an entire notebook document, use Format Notebook from the VS Code context menu.

Test Plan

The VS Code extension does not yet have Jupyter Notebook support enabled, so you'll first need to enable it manually. To do this, checkout the pre-release branch and modify src/common/server.ts as follows:

Before:
Screenshot 2024-05-13 at 10 59 06 PM

After:
Screenshot 2024-05-13 at 10 58 24 PM

I recommend testing this PR with large, complicated notebook files. I used notebook files from this popular repository in my preliminary testing.

The main thing to test is ensuring that notebook cells behave the same as Python documents, besides the aforementioned issue with ruff.applyFormat. You should also test adding and deleting cells (in particular, deleting all the code cells and ensure that doesn't break anything), changing the kind of a cell (i.e. from markup -> code or vice versa), and creating a new notebook file from scratch. Finally, you should also test that source actions work as expected (and across the entire notebook).

Note: ruff.applyAutofix and ruff.applyOrganizeImports are currently broken for notebook files, and I suspect it has something to do with #11248. Once this is fixed, I will update the test plan accordingly.

@snowsignal snowsignal added the server Related to the LSP server label Apr 29, 2024
@snowsignal snowsignal added this to the Ruff Server: Beta milestone Apr 29, 2024
Copy link
Contributor

github-actions bot commented Apr 29, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@snowsignal snowsignal force-pushed the jupyter-notebook branch 5 times, most recently from 3a721e2 to ef8727f Compare May 9, 2024 09:02
@snowsignal snowsignal force-pushed the jupyter-notebook branch 13 times, most recently from 4fb4f0f to a8e00a6 Compare May 14, 2024 05:40
Copy link
Member Author

@snowsignal snowsignal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments to add context for a few parts of the code and also to remind myself to refactor a few things before we merge 🙂

crates/ruff_server/src/edit/range.rs Outdated Show resolved Hide resolved

/// Stores and tracks all open documents in a session, along with their associated settings.
#[derive(Default)]
pub(crate) struct Index {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context for this:

  • This is the replacement for Workspaces, which used to be the encapsulating struct that stored all documents.
  • Index uses a flat structure to store documents and workspace settings, which makes it marginally more efficient to look-up documents by URL. The big advantage, however, is that we can map arbitrary notebook cells to document paths really easily, without having to search each workspace in turn.
  • I've moved the logic for mutating documents to here - in other words, DocumentController is private to index.rs, and to update documents you need to call the associated update function on Index. The reason why I made this change is because we need to update the notebook cell index after cells get added or deleted, and I wanted to centralize that logic here.

crates/ruff_server/src/session/index.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/fix.rs Outdated Show resolved Hide resolved
Comment on lines +29 to +34
/// Edits to fix the diagnostic. If this is empty, a fix
/// does not exist.
pub(crate) edits: Vec<lsp_types::TextEdit>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context: we now generate the text edits in advance and store that in the associated data struct, instead of generating them later.

let tokens: Vec<LexResult> =
ruff_python_parser::tokenize(source_kind.source_code(), source_type.as_mode());

let index = LineIndex::from_source_text(source_kind.source_code());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebook documents don't have a line index, so we need to build a new line index from the combined source. The side effect of this is that we also re-generate the line index for regular documents instead of using their existing one.

(I'd like to eventually remove the line index from TextDocument, since at the moment we wastefully re-generate it whenever it gets updated - and we don't even use it in that many places now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate the logic for Notebook and text document to avoid building the index multiple times? But, I see that you want to get remove the index altogether for text document so maybe this isn't worth looking into right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could separate the logic but I don't think it's worth the technical investment IMO if we plan to eventually remove this anyway.

crates/ruff_server/src/lint.rs Outdated Show resolved Hide resolved
@snowsignal snowsignal marked this pull request as ready for review May 14, 2024 06:55
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've provided some initial comments and I plan to look at other changes soon. It's quite a big PR :)

It would be helpful if you could briefly enumerate all the changes made in the PR description. I'd especially find the list of refactoring useful. It would be also useful for posterity to provide any context on the reasoning behind the implementation and if there were any other solutions that you thought of but didn't implement it.

Can you also update the test plan with all the cases that you've tested this against? Ideally it could contain the VS Code settings used and the action performed (command, code action, etc.). There are so many VS Code settings, commands and code actions for Notebook 😅

crates/ruff_notebook/src/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_notebook/src/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/notebook.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/range.rs Outdated Show resolved Hide resolved
crates/ruff_server/src/edit/range.rs Outdated Show resolved Hide resolved
let tokens: Vec<LexResult> =
ruff_python_parser::tokenize(source_kind.source_code(), source_type.as_mode());

let index = LineIndex::from_source_text(source_kind.source_code());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate the logic for Notebook and text document to avoid building the index multiple times? But, I see that you want to get remove the index altogether for text document so maybe this isn't worth looking into right now.

Comment on lines 262 to 266
types::NotebookDocumentFilter::ByType {
notebook_type: "jupyter-notebook".to_string(),
scheme: Some("file".to_string()),
pattern: None,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to VS Code? I'm referring to the "jupyter-notebook" type. For reference, we don't use any notebook document filter in ruff-lsp:

https://github.com/astral-sh/ruff-lsp/blob/b8a48009f35756dc51268beb6bbc39c21a1e3a46/ruff_lsp/server.py#L129-L140

This asks to only send the request to sync the Python code cells as we aren't interested in any kind of cell.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this from the LSP specification:
Screenshot 2024-05-15 at 9 37 34 PM

I assume this would be the same across editors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm not sure where does the jupyter-notebook type comes from.

I assume this would be the same across editors.

I think it depends - If it is set by VS Code, then I don't think it would be same.

I don't think we need to worry about this now but, just to be on the safer side, I'd probably keep this same as ruff-lsp.

Does this work on unsaved Notebook files? Like, a new notebook file is opened in VS Code but it doesn't exists on disk yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on unsaved Notebook files? Like, a new notebook file is opened in VS Code but it doesn't exists on disk yet.

This does fail - I think it might be because an unsaved URL isn't a valid file path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... it seems like, in general, that the server doesn't support files created from Ctrl+N in VS Code if they aren't saved to the file system. I'll create a follow-up issue for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does fail, but only because new notebook cell url gets passed into didOpenTextDocument, which isn't expected. Do we need to support 'unattached' notebook cells?

I just checked, we do support it. It's probably because we never touch the filesystem in ruff-lsp and always work with stdin with ruff command.

crates/ruff_server/src/server/api.rs Show resolved Hide resolved
crates/ruff_server/src/session.rs Outdated Show resolved Hide resolved
@dhruvmanila
Copy link
Member

dhruvmanila commented May 16, 2024

I think there's another problem here. To demonstrate that refer:

ruff-lsp

Screen.Recording.2024-05-16.at.11.01.41.mov

ruff server

Screen.Recording.2024-05-16.at.10.59.05.mov

You can see that there's an additional newline which gets added at the end for ruff server. The reason this is happening is because the cell content are concatenated by a newline but the newline doesn't actually exists in the original source code.

The reason it doesn't happen in ruff-lsp is because we post-process the Notebook where we strip this additional newline before updating the cell content:

/// Update the cell contents with the transformed content.
///
/// ## Panics
///
/// Panics if the transformed content is out of bounds for any cell. This
/// can happen only if the cell offsets were not updated before calling
/// this method or the offsets were updated incorrectly.
fn update_cell_content(&mut self, transformed: &str) {
for (&idx, (start, end)) in self
.valid_code_cells
.iter()
.zip(self.cell_offsets.iter().tuple_windows::<(_, _)>())
{
let cell_content = transformed
.get(start.to_usize()..end.to_usize())
.unwrap_or_else(|| {
panic!(
"Transformed content out of bounds ({start:?}..{end:?}) for cell at {idx:?}"
);
});
self.raw.cells[idx as usize].set_source(SourceValue::StringArray(
UniversalNewlineIterator::from(
// We only need to strip the trailing newline which we added
// while concatenating the cell contents.
cell_content.strip_suffix('\n').unwrap_or(cell_content),
)
.map(|line| line.as_full_str().to_string())
.collect::<Vec<_>>(),
));
}
}

This post-processing doesn't happen for the new server because it uses a wrapper to lint / format (lint.rs and format.rs).

@dhruvmanila
Copy link
Member

Notes from testing

  1. Did you implement the notebook.* variants of the source code actions? It doesn't seem to be working. They are notebook.source.fixAll and notebook.source.organizeImports which are specific to Notebook. Here, the server will receive only the first cell and we're responsible on returning the edits for the entire notebook. I'm using the following VS Code settings:

      "notebook.codeActionsOnSave": {
        "notebook.source.fixAll": "explicit"
      }

    For reference, https://github.com/astral-sh/ruff-lsp/blob/b8a48009f35756dc51268beb6bbc39c21a1e3a46/ruff_lsp/server.py#L219-L229

  2. This might be unrelated to this PR but I'm getting the following error quite frequently:

    2024-05-16 17:34:08.101 [info]  196.046579s WARN ruff_server::server::api Received notification $/setTrace which does not have a handler.
    
    2024-05-16 17:34:10.471 [info]  198.416243s ERROR ruff_server::server::api An error occurred while running textDocument/didClose: Unable to take snapshot for document with URL file:///Users/dhruv/.pyenv/versions/3.7.17/lib/python3.7/random.py
    

    It's usually occurs after the setTrace notification. The confusing part is that I have no idea why is it occurring on textDocument/didClose when I haven't closed any document.

  3. Based on this in the PR description:

    Note: ruff.applyAutofix and ruff.applyOrganizeImports are currently broken for notebook files, and I suspect it has something to do with Workspace edits from ruff server commands occasionally fail in VS Code when operating on an yet-to-be-created file #11248. Once this is fixed, I will update the test plan accordingly.

    Were you able to look into this and fix it? I'm unable to run any "Ruff:" prefixed commands except for "Format Document".

@snowsignal
Copy link
Member Author

@dhruvmanila

Did you implement the notebook.* variants of the source code actions?

No, this still needs to be added.

This might be unrelated to this PR but I'm getting the following error quite frequently

This is an unrelated bug that I've seen several times before. I'll open a proper issue for it.

Were you able to look into this and fix it? I'm unable to run any "Ruff:" prefixed commands except for "Format Document".

No, this still needs to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server Related to the LSP server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ruff server: Support .ipynb (Jupyter Notebook) files
3 participants