Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-allocating trim method #247

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

do4gr
Copy link

@do4gr do4gr commented Sep 24, 2021

Hey, thanks for the library.

When I was using it I saw that a lot of time was spent in the trim method and stumbled over the comment about making it non-allocating.

I took a stab at it and cargo bench shows a big improvement. Let me know if that is something that you would be interested in , than I can keep working on it a bit more.

Below is the output of the included bench functions for trimming. I think they are showing the best case though since the test file seems to contain no trimmable white-space.
Also what is confusing to me is that the StringRecord benchmark also benefits from improvements to the ByteRecord trim method, not sure where this is coming from.

Screen Shot 2021-09-24 at 21 59 36

add additional tests for trim
add testcase that does actual trimming work
@do4gr
Copy link
Author

do4gr commented Oct 2, 2021

I cleaned up the code a bit and added a bench case that actually has to remove whitespace. There the difference is less stark, but still pretty big.

Screen Shot 2021-10-02 at 12 36 10

Screen Shot 2021-10-02 at 12 37 02

@Tristramg
Copy link

Thank you for this pull request. This yields a significant performance improvement in our project.

@BurntSushi what would be the next steps to consider this PR?

@BurntSushi
Copy link
Owner

I need to review it. From a quick glance, it looks like there's some clean up required here. For example, there shouldn't be a "trim original" routine in the public API.

CI also obviously needs to be fixed.

@Tristramg
Copy link

@do4gr hello, thank you at lot for this. Do you plan to work on this PR again? Otherwise I can try to polish what you did in order to get it merged.

@do4gr
Copy link
Author

do4gr commented Jan 4, 2022

I'll remove the trim_original and then we'll see whether the PR run succeed. I think it was just flakey CI.

@do4gr
Copy link
Author

do4gr commented Jan 4, 2022

Apart from that from my side the PR would be done. There is nothing more I was thinking of including.

@do4gr
Copy link
Author

do4gr commented Jan 5, 2022

That's it from my side. Let me know if there is more that needs to be done.

Copy link

@robryk robryk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this PR.

for element_end_idx in &mut self.0.bounds.ends[..length] {
// left trim
let mut found_only_whitespace = true;
while read_pos < *element_end_idx {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest finding the first non-whitespace position first and then using slice.copy_within to copy the rest. That would (a) make this somewhat easier to read (e.g. obviate the need for found_only_whitespace) and (b) be potentially easier to optimize for the compiler (no need to deduce that once the branch with the copy is taken it will be taken until the loop finishes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants