Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect padding in doc comments #204

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pjsier
Copy link

@pjsier pjsier commented Oct 10, 2021

What does this PR accomplish?

  • 🦚 Feature

Closes #133.

Changes proposed by this PR:

Detect padding in doc comments with CommentPaddingStyle enum

Notes to reviewer:

Adds enum and detect_padding method to parse and store padding style

📜 Checklist

  • Works on the ./demo sub directory
  • Test coverage is excellent and passes
  • Documentation is thorough

@@ -111,13 +111,26 @@ impl CommentVariant {
}
}

#[derive(Clone, Hash, PartialEq)]
pub enum Padding {
Padding(String),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only have one content variant here, we might want to limit it to AsteriskSpace{ leading_spaces: usize }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks! To make sure I'm understanding, for the string " * ", it would be AsteriskSpace with 1 for leading_spaces or 3 to account for the whole string?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the two variants: Doc and NonDoc, the first might be a single span for now and hence contain all leading spaces as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a separate enum or replace the current one? I updated the current enum to reflect your comments and the more specific variant

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was misleading, sorry about that. I think we can represent both with the enum Padding with the Asterisk* variant.

@pjsier
Copy link
Author

pjsier commented Oct 11, 2021

Thanks for the feedback on this! I updated the branch with your comments, should this be incorporated into the trim_span method or should it only detect the padding style for now and leave the padding inside the TrimmedLiteral?

@drahnr
Copy link
Owner

drahnr commented Oct 11, 2021

Thanks for the feedback on this! I updated the branch with your comments, should this be incorporated into the trim_span method or should it only detect the padding style for now and leave the padding inside the TrimmedLiteral?

I think, removing it at the span/syn/ra_ap_syntax stage will simplify later stages significantly (or: avoid additional code changes), so I'd recommend having a clean TrimmedLiteral that has a representation without the leading \s*\*\s.

Note: There is a caveat that a few consecutive \s\*\s could also be a markdown lists if they are not continuous for each line :)

@pjsier
Copy link
Author

pjsier commented Oct 11, 2021

Great, thanks! Should this be in or before the trim_span method then? I know you had mentioned moving the padding detection after, but it seems like we would need ti there.

Also, I think we'll need to bump the version of fancy-regex to take advantage of the replace_all method for parsing this if that's alright

@drahnr
Copy link
Owner

drahnr commented Oct 11, 2021

Also, I think we'll need to bump the version of fancy-regex to take advantage of the replace_all method for parsing this if that's alright

Whereever you see fit :)

Bump deps as needed.

@drahnr drahnr added the enhancement 🦚 New feature or request label Oct 13, 2021
@drahnr drahnr added this to the v0.9.0 milestone Oct 13, 2021
@pjsier
Copy link
Author

pjsier commented Oct 13, 2021

I made some updates to reflect the recent feedback but it looks like I'm a bit stuck again. rendered currently includes some content that is ignored like the prefix and suffix, so I initially tried to maintain this and only store metadata about the padding without modifying the content of rendered directly. I ran into issues because the replacement logic for as_str became more complicated than the offsets from prefix and postfix currently used, and it seemed like it would require returning a String instead of &str

Should rendered include the padding strings, and if so should a separate field be added to TrimmedLiteral that stores the content with padding strings removed? Thanks again!

@@ -656,7 +730,7 @@ mod tests {
block_comment_test!(
trimmed_multi_doc,
"/**
mood
* mood
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both variants should work, the * should be optional.

Comment on lines +696 to +700
let content = "/**\n doc\n doc\n */".to_string();
assert_matches!(
detect_and_remove_padding(&content),
(CommentPaddingStyle::NoPadding, content)
);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer smaller test cases, even if this has a compile time cost to it.

And I think we need a few more to cover the indented variants, as in multiple leading spaces before the asterisk.

Comment on lines +697 to +699
assert_matches!(
detect_and_remove_padding(&content),
(CommentPaddingStyle::NoPadding, content)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice use of assert_matches :)

Comment on lines +341 to +344
lazy_static! {
static ref PADDING_STR: Regex =
Regex::new(r##"(?m)^\s\*\s"##).expect("PADDING_STR regex compiles");
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#202 introduced an optimization to avoid the regex entirely and use smid instructions as available, probably worth to hand roll this, but that can be done once you're confident of the structure - this is works and is good for the time being 👍

@drahnr
Copy link
Owner

drahnr commented Oct 18, 2021

I made some updates to reflect the recent feedback but it looks like I'm a bit stuck again. rendered currently includes some content that is ignored like the prefix and suffix, so I initially tried to maintain this and only store metadata about the padding without modifying the content of rendered directly. I ran into issues because the replacement logic for as_str became more complicated than the offsets from prefix and postfix currently used, and it seemed like it would require returning a String instead of &str

I think you have two options: Either replace as_str() -> &'_ str into a as_str_set() -> &[&'_ str] as well as impl ToString and deal with the fallout over the code base. Note that this will not avoid the allocation, since the set creation will itself require an allocation.
While the above is possible, I think it's much easier to split a TrimmedLiteral into multiple and adjust the Cluster (if needed, I did not dig into the details). This would essentially avoid the whole ordeal of handling paddings in the fn render(), but re-use existing logic of TrimmedLiteral clustering, where padding could be just a another delimiter as initially discussed.
It's super important though to get the offsets right (with emojis and multi-width characters) and have sufficient test cases, any issue here will cause wrong spans being passed throughout the pipeline and that's not-fun™ to debug.

Should rendered include the padding strings, and if so should a separate field be added to TrimmedLiteral that stores the content with padding strings removed? Thanks again!

rendered should definitely not contain those. It was always meant to be the content representation that is presented to the next stage in the documentation generation pipeline (here: parsing with markdown/cmark) and hence must be cleared from any control/style chars that are only relevant to the rust documentation annotation.


Just explaining this to you makes it clear that there is a lot of documentation missing and quite a few design joices are not obvious, and naming of things could be improved by quite a bit. Very much appreciate your effort!

@pjsier let me know if there is anything else I can do to aid you driving this to completion :)

@drahnr drahnr self-assigned this Oct 20, 2021
@drahnr
Copy link
Owner

drahnr commented Oct 27, 2021

@pjsier gentle ping :)

@pjsier
Copy link
Author

pjsier commented Oct 27, 2021

@drahnr thanks for following up on this! I haven't had as much time as I would have liked, so if I'm holding anything up feel free to take over and thanks for all the input

@drahnr
Copy link
Owner

drahnr commented Oct 28, 2021

@drahnr thanks for following up on this! I haven't had as much time as I would have liked, so if I'm holding anything up feel free to take over and thanks for all the input

I think it's better if you push it over the finish line :) there is no particular rush here - take your time!

@drahnr drahnr removed this from the v0.9.0 milestone Oct 28, 2021
@drahnr drahnr added stale 😐 PRs and issues that need some additional work to make it across the finishline good first issue 🔰 Good for newcomers help wanted 🤝 Extra attention is needed internal ⚙️ Internal issues or TODOs labels Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🦚 New feature or request good first issue 🔰 Good for newcomers help wanted 🤝 Extra attention is needed internal ⚙️ Internal issues or TODOs stale 😐 PRs and issues that need some additional work to make it across the finishline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

detect style prefix in dev comments
2 participants