Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lot of unsafe code could probably be replaced by safe code without perf. impact #18

Open
Ten0 opened this issue Nov 30, 2020 · 5 comments

Comments

@Ten0
Copy link

Ten0 commented Nov 30, 2020

Looking at the source code:
https://docs.rs/slice-group-by/0.2.6/src/slice_group_by/linear_group/linear_group_by.rs.html#131-136

It seems this crate makes extensive use of unsafe.
However, it also looks like this unsafe is unnecessary as it creates no performance improvement:

Representation with the two pointers is exactly the memory representation of a slice, so it looks like this:

pub struct LinearGroupBy<'a, T: 'a, P> {
    ptr: *const T,
    end: *const T,
    predicate: P,
    _phantom: marker::PhantomData<&'a T>,
}

could be replaced by this:

pub struct LinearGroupBy<'a, T: 'a, P> {
    elements_left: &'a [T],
    predicate: P,
}

while this:

                let mut i = 0;
                let mut ptr = self.ptr;

                // we use an unsafe block to avoid bounds checking here.
                // this is safe because the only thing we do here is to get
                // two elements at `ptr` and `ptr + 1`, bounds checking is done by hand.

                // we need to get *two* contiguous elements so we check that:
                //  - the first element is at the `end - 1` position because
                //  - the second one will be read from `ptr + 1` that must
                //    be lower or equal to `end`
                unsafe {
                    while ptr != self.end.sub(1) {
                        let a = &*ptr;
                        ptr = ptr.add(1);
                        let b = &*ptr;

                        i += 1;

                        if !(self.predicate)(a, b) {
                            let slice = $mkslice(self.ptr, i);
                            self.ptr = ptr;
                            return Some(slice)
                        }
                    }
                }

could be rewritten using for (i,s) in self.elements_left.windows(2).enumerate() and slice::split_at/slice::split_at_mut.

(and obviously this:

                // `i` is either `0` or the `slice length - 1` because either:
                //  - we have not entered the loop and so `i` is equal to `0`
                //    the slice length is necessarily `1` because we ensure it is not empty
                //  - we have entered the loop and we have not early returned
                //    so `i` is equal to the slice `length - 1`
                let slice = unsafe { $mkslice(self.ptr, i + 1) };
                self.ptr = self.end;
                Some(slice)

by this:

Some(std::mem::replace(&mut self.elements_left, &[]))

)

I don't think this change would create additional runtime cost, as most double-bounds-checking (or constant-bounds-checking in particular with the constants 2 and 1 here) usually gets optimized-out by the compiler, and I would feel much more comfortable using this crate, as I wouldn't like to introduce so much unsafe to our codebase for such a simple algorithm.

@Kerollmops
Copy link
Owner

Kerollmops commented Dec 24, 2020

Hey @Ten0,

I am on your side on this. I implemented the linear version of this algorithm in safe Rust, and proposed it to the standard library. I am now waiting for it to be merged.

For your information, I didn't take the time to check the performance impact, fortunately, I have an important set of benchmarks.

@Ten0
Copy link
Author

Ten0 commented Oct 11, 2021

Hello!
I'm in need of this algorithm again.
It looks like this is fixed on the current master (#19), however I also noticed that slice-group-by 0.2.7 that fixes this has been released then yanked.
What is the current status of this? Was there an issue with 0.2.7?

@Kerollmops
Copy link
Owner

Kerollmops commented Oct 12, 2021

Hey @Ten0,

Sorry about that, I just published version 0.3.0 which is the real version that contains the changes you are looking for. Don't hesitate to ask me anything if you have any requests about this library.

@Ten0 Ten0 closed this as completed Oct 13, 2021
@Ten0 Ten0 reopened this Oct 13, 2021
@Ten0
Copy link
Author

Ten0 commented Aug 16, 2023

Hmm looks like linear group by key has the same issue that could be fixed in the same way, no? Or even more simply, by making linear_group_by_key wrap a linear_group_by?

@Kerollmops
Copy link
Owner

Kerollmops commented Dec 14, 2023

This could be the case, but I don't have time right now. Maybe you can propose a pull request? I am pretty sure it will not be breaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants