Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ruff] update and vendor annotate-snippets #15359

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented Jan 8, 2025

This PR upgrades to the latest version of
annotate-snippets
and vendors it. The motivating reason for vendoring the
crate is because it makes a non-elidable change to the
diagnostic header
. (Note that
I've submitted a PR
to help us work-around that issue, but it might not be
a good fit for upstream.) Moreover,
while annotate-snippets is dense, it's not very big. And
I think it makes sense for us to maintain our own copy,
at least for now, so that we can be masters of our own
destiny over a critical component of Ruff.

For example, our vendored copy now contains two bug fixes
to annotate-snippets that I sent to upstream, but we
can benefit from the fixes now. See rust-lang/annotate-snippets-rs#169
and rust-lang/annotate-snippets-rs#170.

Reviewers should look at this PR commit-by-commit. The
unusually large number of commits is a result of dividing
the snapshot changes (over 1,000 of them) into different
categories. I did this so that I could more easily review
the changes and ensure they are correct and/or desirable.
In some cases, they are bug fixes. In other cases, they
are just stylistic improvements. In yet other cases, they
flagged bugs in annotate-snippets themselves. While the
number of commits is large, I actually believe this will
make review much easier and quicker.

Closes #10832, Ref #11618

@BurntSushi
Copy link
Member Author

BurntSushi commented Jan 8, 2025

Bah, a bunch of Clippy lints are firing on the vendored copy of annotate-snippets. I'm considering just squashing all of them in order to reduce the diff with upstream, but that might be a non-goal.

EDIT: This is what I ended up doing.

Copy link
Contributor

github-actions bot commented Jan 8, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

@BurntSushi BurntSushi force-pushed the ag/upgrade-annotate-snippets branch 2 times, most recently from 1175d5e to 572632e Compare January 8, 2025 20:54
crates/ruff_linter/src/message/text.rs Show resolved Hide resolved
crates/ruff_linter/src/message/text.rs Show resolved Hide resolved
crates/ruff_linter/src/message/text.rs Outdated Show resolved Hide resolved
crates/ruff_linter/src/message/text.rs Outdated Show resolved Hide resolved
@Gankra
Copy link

Gankra commented Jan 8, 2025

partial review status: i've "approved" all the commits that aren't test:

@MichaReiser MichaReiser added the do-not-merge Do not merge this pull request label Jan 9, 2025
@MichaReiser
Copy link
Member

Adding "do-not-merge" until the 0.9 release is out

Comment on lines 121 to 125
|
3 | del x, y[
4 | z
| ^ Syntax Error: unexpected EOF while parsing
|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you! This has been annoying for a while, happy to see this fixed.

@BurntSushi BurntSushi changed the title update and vendor annotate-snippets [ruff] update and vendor annotate-snippets Jan 9, 2025
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you and thanks a ton for splitting the commits as you did. It allowed me to skip over a large chunk of the snapshot changes because I knew they're all whitespace (and github's UI is pain because it requires clicking load changes for many of them).

I think there are a few cases where the snapshots now are incorrect and may require changes to the rule or rendering. The truncation itself is nice, but I think we can't use ... because it's a valid syntax element in Python.

crates/ruff_python_parser/tests/fixtures.rs Show resolved Hide resolved
Comment on lines 322 to 343
let tab_width = u32::try_from(line_width.get() - old_width)
.expect("small width because of tab sisze");
result.push_str(&source[last_end..index]);

for _ in 0..tab_width {
result.push(' ');
}

last_end = index + 1;
update_range(index, tab_width);
} else if let Some(printable) = unprintable_replacement(c) {
result.push_str(&source[last_end..index]);
result.push(printable);
last_end = index + 1;

let len = u32::try_from(printable.len_utf8()).expect("4 or fewer UTF-8 code units");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the u32::try from here by using char::text_len (char implements the TextLen trait). It also better encodes that len is a byte offset rather than any u32.

---
E11.py:9:1: E112 Expected an indented block
|
7 | #: E112
8 | if False:
| ^ E112
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure if I like this change. It's not incorrect but it's probably a bit misleading because the indent needs to be inserted at the next line: Adding a tab at the end of this line is useless. The same applies for the other E* rules.

It seems we already special case indents. Maybe this is no longer necessary?

let range = if first_token.kind() == TokenKind::Indent {
first_token.range()
} else {
TextRange::new(locator.line_start(first_token.start()), first_token.start())
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm yes I see. I'll look into this. I could see this being tricky because I think part of the annotate-snippets fixes was rendering a span immediately after the line terminator to be at the end of the line it terminates instead of at the start of the next line. And it seems like this is what led to at least some portion of the fixes here. But I think this ends up being wrong for cases where we actually want the annotation pointing to the beginning of a line.

Like, doesn't my description of how annotate-snippets works with a line terminator imply that you can never point to the beginning of a line? If so, then in theory the right fix there is to change annotate-snippets to render a span immediately after a line terminator to point to the start of the following line, and then any lints that want to point to the end of a line need to put the span just before the line terminator. If so, that's probably a pretty gnarly fix, but let me dig into it and see if my description here is actually accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this commit. I think that all changes here are undesired other than for I001. We should double check what the ranges are but most of the rules try to render a marker right at the start of a line with an empty range.

Comment on lines +15 to +18
10 | |
11 | |
12 | |
| |__^ I001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is better but it's far from nice :D

Would you mind opening an issue to look into the I001 ranges. I think we should trim them to the imports range.

@sharkdp
Copy link
Contributor

sharkdp commented Jan 9, 2025

and github's UI is pain because it requires clicking load changes for many of them

@MichaReiser I have a "Load diffs" bookmark in my browsers toolbar which does this automatically 😄

javascript:document.querySelectorAll('.load-diff-button').forEach(node => node.click())

@AlexWaygood AlexWaygood removed the do-not-merge Do not merge this pull request label Jan 9, 2025
This merely adds the crate to our repository. Some cosmetic changes are
made to make it work in our repo and follow our conventions, such as
changing the name to `ruff_annotate_snippets`. We retain the original
license information. We do drop some things, such as benchmarks, but
keep tests and examples.
This is a tiny change that, perhaps slightly shady, permits us to use
the `annotate-snippets` renderer without its mandatory header (which
wasn't there in `annotate-snippets 0.9`). Specifically, we can now do
this:

    Level::None.title("")

The combination of a "none" level and an empty label results in the
`annotate-snippets` header being skipped entirely. (Not even an empty
line is written.)

This is maybe not the right API for upstream `annotate-snippets`, but
it's very easy for us to do and unblocks the upgrade (albeit relying on
a vendored copy).

Ref rust-lang/annotate-snippets-rs#167
This is pretty much just moving to the new API and taking care to use
byte offsets. This is *almost* enough. The next commit will fix a bug
involving the handling of unprintable characters as a result of
switching to byte offsets.
Previously, we were replacing unprintable ASCII characters with a
printable representation of them via fancier Unicode characters. Since
`annotate-snippets` used to use codepoint offsets, this didn't make our
ranges incorrect: we swapped one codepoint for another.

But now, with the `annotate-snippets` upgrade, we use byte offsets
(which is IMO the correct choice). However, this means our ranges can be
thrown off since an ASCII codepoint is always one byte and a non-ASCII
codepoint is always more than one byte.

Instead of tweaking the `ShowNonprinting` trait and making it more
complicated (which is used in places other than this diagnostic
rendering it seems), we instead change `replace_whitespace` to handle
non-printable characters. This works out because `replace_whitespace`
was already updating the annotation range to account for the tab
replacement. We copy that approach for unprintable characters.
These snapshot changes should *all* only be a result of changes to
trailing whitespace in the output. I checked a psuedo random sample of
these, and the whitespace found in the previous snapshots seems to be an
artifact of the rendering and _not_ of the source data. So this seems
like a strict bug fix to me.

There are other snapshots with whitespace changes, but they also have
other changes that we split out into separate commits. Basically, we're
going to do approximately one commit per category of change.

This represents, by far, the biggest chunk of changes to snapshots as a
result of the `annotate-snippets` upgrade.
These updates center around the addition of annotations in the
diagnostic rendering. Previously, the annotation was just not rendered
at all. With the `annotate-snippets` upgrade, it is now rendered. I
examined a pseudo random sample of these, and they all look correct.

As will be true in future batches, some of these snapshots also have
changes to whitespace in them as well.
It's hard to grok the change from the snapshot diffs alone, so here's
one example. Before:

    PYI021.pyi:15:5: PYI021 [*] Docstrings should not be included in stubs
       |
    14 |   class Baz:
    15 |       """Multiline docstring
       |  _____^
    16 | |
    17 | |     Lorem ipsum dolor sit amet
    18 | |     """
       | |_______^ PYI021
    19 |
    20 |       def __init__(self) -> None: ...
       |
       = help: Remove docstring

And now after:

    PYI021.pyi:15:5: PYI021 [*] Docstrings should not be included in stubs
       |
    14 |   class Baz:
    15 | /     """Multiline docstring
    16 | |
    17 | |     Lorem ipsum dolor sit amet
    18 | |     """
       | |_______^ PYI021
    19 |
    20 |       def __init__(self) -> None: ...
       |
       = help: Remove docstring

I personally think both of these are fine. If we felt strongly, I could
investigate reverting to the old style, but the new style seems okay to
me.

In other words, these updates I believe are just cosmetic and not a bug
fix.
The previous rendering just seems wrong in that a `^` is omitted. The
new version of `annotate-snippets` seems to get this right. I checked a
pseudo random sample of these, and it seems to only happen when the
position pointed at a line terminator.
This looks like a bug fix that occurs when the annotation is a
zero-width span immediately following a line terminator. Previously, the
caret seems to be rendered on the next line, but it should be rendered
at the end of the line the span corresponds to.

I admit that this one is kinda weird. I would somewhat expect that our
spans here are actually incorrect, and that to obtain this sort of
rendering, we should identify a span just immediately _before_ the line
terminator and not after it. But I don't want to dive into that rabbit
hole for now (and given how `annotate-snippets` now renders these
spans, perhaps there is more to it than I see), and this does seem like
a clear improvement given the spans we feed to `annotate-snippets`.
I believe this case is different from the last in that it happens when
the end of a *multi-line* annotation occurs after a line terminator.
Previously, the diagnostic would render on the next line, which is
definitely a bit weird. This new update renders it at the end of the
line the annotation ends on.

In some cases, the annotation was previously rendered to point at source
lines below where the error occurred, which is probably pretty
confusing.
This group of updates is similar to the last one, but they call out the
fact that while the change is an improvement, it does still seem to be a
little buggy.

As one example, previously we would have this:

       |
     1 | / from __future__ import annotations
     2 | |
     3 | | from typing import Any
     4 | |
     5 | | from requests import Session
     6 | |
     7 | | from my_first_party import my_first_party_object
     8 | |
     9 | | from . import my_local_folder_object
    10 | |
    11 | |
    12 | |
    13 | | class Thing(object):
       | |_^ I001
    14 |     name: str
    15 |     def __init__(self, name: str):
       |
       = help: Organize imports

And now here's what it looks like after:

       |
     1 | / from __future__ import annotations
     2 | |
     3 | | from typing import Any
     4 | |
     5 | | from requests import Session
     6 | |
     7 | | from my_first_party import my_first_party_object
     8 | |
     9 | | from . import my_local_folder_object
    10 | |
    11 | |
    12 | |
       | |__^ Organize imports
    13 |   class Thing(object):
    14 |     name: str
    15 |     def __init__(self, name: str):
       |
       = help: Organize imports

So at least now, the diagnostic is not pointing to a completely
unrelated thing (`class Thing`), but it's still not quite pointing to
the imports directly. And the `^` is a bit offset. After looking at
some examples more closely, I think this is probably more of a bug
with how we're generating offsets, since we are actually pointing to
a location that is a few empty lines _below_ the last import. And
`annotate-snippets` is rendering that part correctly. However, the
offset from the left (the `^` is pointing at `r` instead of `f` or even
at the end of `from . import my_local_folder_object`) appears to be a
problem with `annotate-snippets` itself.

We accept this under the reasoning that it's an improvement, albeit not
perfect.
…pace

I separated out this snapshot update since the string of `^` including
whitespace looked a little odd. I investigated this one specifically,
and indeed, our span in this case is telling `annotate-snippets` to
point at the whitespace. So this is `annotate-snippets` doing what it's
told with a mildly sub-optimal span.

For clarity, the before rendering is:

    skip.py:34:1: I001 [*] Import block is un-sorted or un-formatted
       |
    32 |       import sys; import os  # isort:skip
    33 |       import sys; import os  # isort:skip  # isort:skip
    34 | /     import sys; import os
       |
       = help: Organize imports

And now after is:

    skip.py:34:1: I001 [*] Import block is un-sorted or un-formatted
       |
    32 |     import sys; import os  # isort:skip
    33 |     import sys; import os  # isort:skip  # isort:skip
    34 |     import sys; import os
       | ^^^^^^^^^^^^^^^^^^^^^^^^^ I001
       |
       = help: Organize imports

This is a clear bug fix since it adds in the `I001` annotation, even
though the carets look a little funny by including the whitespace
preceding `import sys; import os`.
This one almost looks like it fits into the other failure categories,
but without identifying root causes, it's hard to say for sure. The span
here does end after a line terminator, so it feels like it's like the
rest.

I also isolated this change since I found the snapshot diff pretty hard
to read and wanted to look at it more closely. In this case, the before
is:

    E204.py:31:2: E204 [*] Whitespace after decorator
       |
    30 |   # E204
    31 |   @ \
       |  __^
    32 | | foo
       | |_^ E204
    33 |   def baz():
    34 |       print('baz')
       |
       = help: Remove whitespace

And the after is:

    E204.py:31:2: E204 [*] Whitespace after decorator
       |
    30 | # E204
    31 | @ \
       |  ^^ E204
    32 | foo
    33 | def baz():
    34 |     print('baz')
       |
       = help: Remove whitespace

The updated rendering is clearly an improvement, since `foo` itself is
not really the subject of the diagnostic. The whitespace is.

Also, the new rendering matches the span fed to `annotate-snippets`,
where as the old rendering does not.
…ource

The change to the rendering code is elaborated on in more detail here,
where I attempted to upstream it:
rust-lang/annotate-snippets-rs#169

Otherwise, the snapshot diff also shows a bug fix: a `^` is now rendered
where as it previously was not.
This fix was sent upstream and the PR description includes more details:
rust-lang/annotate-snippets-rs#170

Without this fix, there was an errant snapshot diff that looked like
this:

  |
1 |   version = "0.1.0"
2 |   # Ensure that the spans from toml handle utf-8 correctly
3 |   authors = [
  |  ___________^
4 | |     { name = "Z...ALGO", email = 1 }
5 | | ]
  | |_^ RUF200
  |

That ellipsis should _not_ be inserted since the line is not actually
truncated. The handling of line length (in bytes versus actual rendered
length) wasn't quite being handled correctly in all cases.

With this fix, there's (correctly) no snapshot diff.
@BurntSushi BurntSushi force-pushed the ag/upgrade-annotate-snippets branch from 1c92833 to bea42de Compare January 9, 2025 18:50
@BurntSushi
Copy link
Member Author

BurntSushi commented Jan 9, 2025

I've updated the PR with some rewritten history and addressed a good portion of the feedback so far. What's remaining is some of the thornier problems with some of the new rendering. My plan is to investigate those, figure out their root causes and figure out how to fix them. I don't think a re-review is helpful yet. I'll ping y'all when a re-review makes more sense.

We do this because `...` is valid Python, which makes it pretty likely
that some line trimming will lead to ambiguous output. So we add support
for overriding the cut indicator. This also requires changing some of
the alignment math, which was previously tightly coupled to `...`.

For Ruff, we go with `…` (`U+2026 HORIZONTAL ELLIPSIS`) for our cut
indicator.

For more details, see the patch sent to upstream:
rust-lang/annotate-snippets-rs#172
This updates snapshots where long lines now get trimmed with
`annotate-snippets`. And an ellipsis is inserted to indicate trimming.

This is a little hokey to test since in tests we don't do any styling.
And I believe this just uses the default "max term width" for rendering.
But in real life, it seems like a big improvement to have long lines
trimmed if they would otherwise wrap in the terminal. So this seems like
an improvement to me.

There are some other fixes here that overlap with previous categories.
This looks like a bug fix since the caret is now pointing right at the
position of the unprintable character. I'm not sure if this is a result
of an improvement via the `annotate-snippets` upgrade, or because of
more accurate tracking of annotation ranges even after unprintable
characters are replaced. I'm tempted to say the former since in theory
the offsets were never wrong before because they were codepoint offsets.

Regardless, this looks like an improvement.
@BurntSushi BurntSushi force-pushed the ag/upgrade-annotate-snippets branch from bea42de to a0d8a70 Compare January 9, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update annotate-snippets to 0.11.0
6 participants