Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slice::check_range #75207

Merged
merged 5 commits into from
Sep 4, 2020
Merged

Add slice::check_range #75207

merged 5 commits into from
Sep 4, 2020

Conversation

dylni
Copy link
Contributor

@dylni dylni commented Aug 6, 2020

This method is useful for RangeBounds parameters. It's even been rewritten many times in the standard library, sometimes assuming that the bounds won't be usize::MAX.

For example, Vec::drain creates an empty iterator when usize::MAX is used as an inclusive end bound:

assert!(vec![1].drain(..=usize::max_value()).eq(iter::empty()));

If this PR is merged, I'll create another to use it for those methods.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @KodrAus (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 6, 2020
/// [`Index::index`]: ../ops/trait.Index.html#tymethod.index
#[track_caller]
#[unstable(feature = "slice_check_range", issue = "none")]
pub fn check_range<R: RangeBounds<usize>>(&self, range: R) -> Range<usize> {
Copy link

@leonardo-m leonardo-m Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't unfortunately return a CheckedRange<usize> because there's no association with a specific range, so it's useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more? I don't see how this would be useless, since it converts RangeBounds to usizes. It wasn't meant to reduce unsafe code.

It could return a CheckedRange that has a reference to the slice and a safe get method. However, that would require check_range_mut and CheckedRangeMut too, which would make this simple method much more complex.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code is useful. I meant to say that returning a CheckedRange is not so useful. As you say it could be done with the reference to the slice, but it could make it over engineered, so all is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense. Thanks for the clarification.

@KodrAus
Copy link
Contributor

KodrAus commented Aug 12, 2020

If this PR is merged, I'll create another to use it for those methods.

Would you be happy to roll replacing the ad-hoc implementations with this in into the same PR here? Just so we can land it all together and know everything is done 🙂

@dylni
Copy link
Contributor Author

dylni commented Aug 13, 2020

Sure, I can update the PR this week. I was initially trying to reduce the scope of the change, but adding it now is not a problem

assert!(end <= len, "upper bound was too large");
(start, end)
// SAFETY: This buffer is only used to check the range.
let buffer = unsafe { slice::from_raw_parts(self.ptr(), self.len()) };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack, but it's not too bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use buffer_as_slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used that method at first, but I decided to use from_raw_parts instead, since it's a hack either way. The issue is that I need a contiguous slice for check_range, so I use the first self.len() elements even if they're not initialized.

I could change it to unsafe { self.buffer_as_slice()[..self.len()] }, but that adds an unnecessary bounds check and looks more valid than it is.

I also added more of this explanation to the safety comment.

@dylni
Copy link
Contributor Author

dylni commented Aug 17, 2020

There are 3 more ad-hoc implementations:

  • match (range.start_bound(), range.end_bound()) {
    (Excluded(s), Excluded(e)) if s == e => {
    panic!("range start and end are equal and excluded in BTreeMap")
    }
    (Included(s) | Excluded(s), Included(e) | Excluded(e)) if s > e => {
    panic!("range start is greater than range end in BTreeMap")
    }
    _ => {}
    };

    The bounds aren't integers, and they're not used for a slice.
  • let start = match start {
    Bound::Included(lo) => lo,
    Bound::Excluded(lo) => lo + 1,
    Bound::Unbounded => 0,
    };
    let end = match end {
    Bound::Included(hi) => hi + 1,
    Bound::Excluded(hi) => hi,
    Bound::Unbounded => length,
    };

    I don't think this method is allowed to panic, but it also doesn't use a slice.
  • let start = match range.start_bound() {
    Bound::Included(ref k) => match self.lookup_index_for(k) {
    Ok(index) | Err(index) => index,
    },
    Bound::Excluded(ref k) => match self.lookup_index_for(k) {
    Ok(index) => index + 1,
    Err(index) => index,
    },
    Bound::Unbounded => 0,
    };
    let end = match range.end_bound() {
    Bound::Included(ref k) => match self.lookup_index_for(k) {
    Ok(index) => index + 1,
    Err(index) => index,
    },
    Bound::Excluded(ref k) => match self.lookup_index_for(k) {
    Ok(index) | Err(index) => index,
    },
    Bound::Unbounded => self.data.len(),
    };

    I don't see any way check_range could be defined that would help here.

Copy link
Contributor

@KodrAus KodrAus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on the extra scope creep and running through all the places we could use this! I've just left a few little comments

assert!(end <= len, "upper bound was too large");
(start, end)
// SAFETY: This buffer is only used to check the range.
let buffer = unsafe { slice::from_raw_parts(self.ptr(), self.len()) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use buffer_as_slice?

if end > len {
end_assert_failed(end, len);
}
let Range { start, end } = self.check_range(range);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this method was pretty finely tuned, do we have any existing benchmarks we can check to see whether there's any impact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Vec::drain has any benches. These are the benches for Vec:
https://github.com/rust-lang/rust/blob/master/library/alloc/benches/vec.rs

However, I wouldn't expect a regression. check_range calls functions with the same attributes for this reason. That's part of why I think it's better to package it with the standard library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there aren't any benches then I'd consider it a bit of a speculative implementation anyways, so am happy for this to be made simpler 👍

@dylni
Copy link
Contributor Author

dylni commented Aug 24, 2020

No problem! It's probably better to include the simplifications in this PR anyway

@KodrAus
Copy link
Contributor

KodrAus commented Sep 4, 2020

Thanks for your patience @dylni! This looks great to me.

@bors r+

@bors
Copy link
Contributor

bors commented Sep 4, 2020

📌 Commit d9e877f has been approved by KodrAus

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 4, 2020
@dylni
Copy link
Contributor Author

dylni commented Sep 4, 2020

Thanks @KodrAus! Don't worry, I've seen you active on a lot of RFCs.

Should I create a tracking issue for this method?

Edit: #76393

@bors
Copy link
Contributor

bors commented Sep 4, 2020

⌛ Testing commit d9e877f with merge 82877c725f6510e8514b197af2e06e1e3f586a97...

@bors
Copy link
Contributor

bors commented Sep 4, 2020

💥 Test timed out

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 4, 2020
@Dylan-DPC-zz
Copy link

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 4, 2020
@bors
Copy link
Contributor

bors commented Sep 4, 2020

⌛ Testing commit d9e877f with merge ef55a0a...

@bors
Copy link
Contributor

bors commented Sep 4, 2020

☀️ Test successful - checks-actions, checks-azure
Approved by: KodrAus
Pushing ef55a0a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 4, 2020
@bors bors merged commit ef55a0a into rust-lang:master Sep 4, 2020
@dylni dylni deleted the add-slice-check-range branch September 4, 2020 16:05
@dylni dylni mentioned this pull request Sep 5, 2020
4 tasks
@RalfJung
Copy link
Member

Something about this PR is breaking Rust's aliasing rules. The test_stable_pointers test started failing when the PR landed: calling drain on only a part of the vector creates a reference that overlaps with unaffected elements, just invalidating those pointers. I am still investigating the details.

if end > len {
end_assert_failed(end, len);
}
let Range { start, end } = self.check_range(range);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call here is the problem: this implicitly creates a slice covering the entire vector, which is a problem because that slice aliases with other pointers that existed before and that should remain valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a fix in #76662.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Wouldn't this use the safe Deref implementation for Vec? Even if not, why doesn't the mutable reference to self on this method prevent aliasing?

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prevents aliasing from safe clients, but Vec explicitly wants to support some unsafe clients that keep pointers into the buffer. That is okay as long as all the operations involved are guaranteed not to reallocate the buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for the testcase that shows how unsafe code can rely on this property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I had no idea Vec makes those guarantees. Thanks for the explanation!

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these guarantees are documented here. I might have gone overboard in the test and check more than is guaranteed, but then it doesn't seem unlikely that people will take this guarantee as far as they can -- so I added a test basically for every operation that I could imagine preserving the buffer. Someone else double-checking this certainly would not hurt. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I read that passage but not thoroughly enough to notice these subtleties.

but then it doesn't seem unlikely that people will take this guarantee as far as they can

IIUC, there is still room for some of those calls to be removed. From the guarantees:

Bulk insertion methods may reallocate, even when not necessary.

Thus, append, extend, extend_from_slice, and sometimes splice shouldn't need to be restricted. Of course, keeping them in the test doesn't hurt. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... maybe we should add a comment though that this test does not constitute a guarantee. I'll prepare a PR.

(start, end)
// SAFETY: This buffer is only used to check the range. It might be partially
// uninitialized, but `check_range` needs a contiguous slice.
// https://github.com/rust-lang/rust/pull/75207#discussion_r471193682
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a partially uninitialized slice is UB. So I don't think the safety comment here is accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was something the standard library could assume. Wouldn't that make buffer_as_slice UB too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was meant to rely on "the standard library is special and can bend the rules", the comment should explicitly say so. The standard library can't just implicitly do this, the risk is too high that someone will copy this code and use it in their own crate.

However, even then it is better to avoid; usually we only do that for layout assumptions (Cc rust-lang/unsafe-code-guidelines#90). For other kinds of UB, it is typically considered a bug.

Wouldn't that make buffer_as_slice UB too?

Maybe? I am not familiar with the code I am afraid, I just read the safety comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looking at the code, buffer_as_slice should likely return a raw slice instead of a reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was meant to rely on "the standard library is special and can bend the rules", the comment should explicitly say so.

You're right. My mistake.

However, even then it is better to avoid; usually we only do that for layout assumptions

That makes sense. I was mostly using buffer_as_slice as a reference, but I agree that avoiding that assumption is better.

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's lots of old code here that was written before we had clear rules, and so does not follow today's rules. This case here is not a big deal because the discussion is still open whether we want this to be UB or not (that's rust-lang/unsafe-code-guidelines#77), but while we discuss it's better to follow the rules, even in the standard library, also to learn how hard/annoying it is to work with these rules.

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also the standard library currently does a poor job at documenting when it relies on extra assumptions, but I am pushing for new code that we add to do better. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all sounds great, :)

/// [`Index::index`]: ops::Index::index
#[track_caller]
#[unstable(feature = "slice_check_range", issue = "none")]
pub fn check_range<R: RangeBounds<usize>>(&self, range: R) -> Range<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we only need the length of the slice, I don't think this method should take a reference to the full slice. References are very strong types, they assert the aliasing rules and they assert that the memory they point to is initialized. As my other messages here show, that is a problem for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future readers, this conversation was continued here.

RalfJung added a commit to RalfJung/rust that referenced this pull request Sep 13, 2020
…lacrum

Fix liballoc test suite for Miri

Mostly, fix the regression introduced by rust-lang#75207 that caused slices (i.e., references) to be created to invalid memory or memory that has aliasing pointers that we want to keep valid. @dylni  this changes the type of `check_range` to only require the length, not the full reference to the slice, which indeed is all the information this function requires.

Also reduce the size of a test introduced in rust-lang#70793 to make it not take 3 minutes in Miri.

This makes https://github.com/RalfJung/miri-test-libstd work again.
RalfJung added a commit to RalfJung/rust that referenced this pull request Sep 13, 2020
…lacrum

Fix liballoc test suite for Miri

Mostly, fix the regression introduced by rust-lang#75207 that caused slices (i.e., references) to be created to invalid memory or memory that has aliasing pointers that we want to keep valid. @dylni  this changes the type of `check_range` to only require the length, not the full reference to the slice, which indeed is all the information this function requires.

Also reduce the size of a test introduced in rust-lang#70793 to make it not take 3 minutes in Miri.

This makes https://github.com/RalfJung/miri-test-libstd work again.
RalfJung added a commit to RalfJung/rust that referenced this pull request Sep 16, 2020
…lacrum

Fix liballoc test suite for Miri

Mostly, fix the regression introduced by rust-lang#75207 that caused slices (i.e., references) to be created to invalid memory or memory that has aliasing pointers that we want to keep valid. @dylni  this changes the type of `check_range` to only require the length, not the full reference to the slice, which indeed is all the information this function requires.

Also reduce the size of a test introduced in rust-lang#70793 to make it not take 3 minutes in Miri.

This makes https://github.com/RalfJung/miri-test-libstd work again.
@cuviper cuviper added this to the 1.48.0 milestone May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants