Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing RawVec<T> with Box<[MaybeUninit<T>]>. #54470

Open
eddyb opened this issue Sep 22, 2018 · 18 comments
Open

Consider replacing RawVec<T> with Box<[MaybeUninit<T>]>. #54470

eddyb opened this issue Sep 22, 2018 · 18 comments
Labels
A-collections Area: `std::collection` T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Sep 22, 2018

After #53508 lands, we could look into this. It might provide a bit more type-safety, although it could be worse to not have the capacity as a separate field.

One thing to note is that setting the "length" component of a reference/pointer to [MaybeUninit<T>] is "less unsafe" than doing so for for [T], but it's unclear to me whether having a &[MaybeUninit<T>] that's larger than the object it points to, is UB or not. (please ignore, I managed to confuse myself)

cc @RalfJung @gankro @rust-lang/libs

@withoutboats
Copy link
Contributor

Seems like it would not fulfill the use cases of RawVec if it doesn't track the capacity. Could be interesting to change RawVec to use MaybeUninit<T>s though.

@dtolnay
Copy link
Member

dtolnay commented Sep 22, 2018

It tracks the capacity but calls it .len() instead of .capacity.

@withoutboats
Copy link
Contributor

I guess I misunderstood eddy's comment about manipulating the length, he means the length tracked in Vec, not the length tracked by the wide pointer?

@eddyb
Copy link
Member Author

eddyb commented Sep 22, 2018

What I meant was the "length" of [MaybeUninit<T>] would be the capacity for [T].
But I realized I jumped the gun on safety, so I edited the issue description.

@strega-nil
Copy link
Contributor

strega-nil commented Sep 22, 2018

@eddyb why would you have a &[MaybeUninit<T>] that has a bigger length than the object it points to?

(I would classify it as an invalid slice, anyways, and therefore UB to value-convert)

@SimonSapin
Copy link
Contributor

We can probably replace RawVec’s pointer and capacity fields, but its methods still contain a bunch of code that need to live somewhere so it seems unlikely that we’d remove the type entirely.

@scottmcm
Copy link
Member

RawVec<T> could safely Deref to [MaybeUninit<T>], right?

@SimonSapin
Copy link
Contributor

I think that would be sound, but would it be useful? RawVec is perma-unstable implementation detail of Vec and VecDeque, not really intended for general use.

@Gankra
Copy link
Contributor

Gankra commented Sep 26, 2018

The value of this seems dubious, considering (iirc) RawVec needs to have two "notions" of capacity (stored vs effective) to make it easier to work with ZSTs, and the whole thing would make code a bit more unclear.

@eddyb
Copy link
Member Author

eddyb commented May 10, 2022

Reopened because of renewed interest:

@RalfJung
Copy link
Member

RalfJung commented May 18, 2022

See #94421 (comment) for some of the fall-out of doing this: by using Box, this becomes subject to the Box aliasing guarantees, which is in conflict with some modes of use of Vec that rely on being able to push without invalidating existing pointers (which seems to be encouraged by the documentation, and there are tests ensuring it works correctly in Miri).

Maybe there is a way to have weaker aliasing guarantees for Box, but there is only so much we can weaken them before we have to stop emitting noalias for Box arguments.
Also see rust-lang/unsafe-code-guidelines#326.

@koehlma
Copy link
Contributor

koehlma commented May 18, 2022

As @RalfJung asked me to continue the discussion over here, let me just copy #94421 (comment).

Sorry for chiming in. I certainly lack the expertise to understand the intricacies of the ongoing debate but @SabrinaJewson just opened an issue for one of my crates which would be affected by this PR. I just wanted to let you know that I was under the impression that Vec explicitly guaranteed stable references. In particular, the statement

push and insert will never (re)allocate if the reported capacity is sufficient.

of the Guarantees section with the previous description of the memory layout makes it seem as if references are guaranteed to be stable when pushing onto a Vec whose length is less than its capacity. At least, this has been my reasoning at the time of implementing the crate. There is also another statement about converting the Vec into a Box without reallocating and “moving” the elements. After reading this PR, it is unclear to me whether these guarantees become void with this PR or whether my understanding of them has been wrong all the time. In the latter case, maybe the documentation could be improved to clarify that stability is explicitly not a guarantee but rather is/was an implementation detail?

@RalfJung
Copy link
Member

Thanks. :)

I tend to agree -- I think we have to keep this code working:

fn main() {
    let mut v = Vec::with_capacity(2);
    v.push(0);
    let ptr = &v[0] as *const i32;
    v.push(0);
    unsafe { dbg!(*ptr) };
}

The question then is whether we weaken Box to also allow such code, and whether that is compatible with its noalias status -- or whether we need a "Box light" for Vec that just provides automatic deallocation but no aliasing guarantees.

@conradludgate
Copy link
Contributor

I know little about how the aliasing rules work. However, I would be fine with Box<T> being noalias, but then Box<[T]> it would make sense in my mind for each individual &T to be noalias instead of the whole box.

How do the noalias rules interact with things like

let Foo { x, y } = &mut foo;

Maybe I'm wrong, but I vaguely remember that destructuring to be magic in splitting the mut references. I would consider the boxed slices to have that same treatment where each individual field can exist separately

@RalfJung
Copy link
Member

let Foo { x, y } = &mut foo;

If you use foo again, both x and y are being invalidated. So, the problem also exists there.

It's not really magic, you can do the same with

let x = &mut foo.x;
let y = &mut foo.y;

@conradludgate
Copy link
Contributor

It's not really magic, you can do the same with

let x = &mut foo.x;
let y = &mut foo.y;

Ideally then, our slice solution would work such that

let x = &mut foo[0];
let y = &mut foo[1];

Is also valid. This is obviously harder though since the IndexMut trait takes the entire &mut self. That's the reason that slice_at_mut method exists to get around this current limitation

@RalfJung
Copy link
Member

For primitive slice accesses (Index in the MIR), this would already work, if the borrow checker accepted it.

But there is a function indirection, and that breaks it. Allowing it with a function indirection might be possible but would likely mean that all functions can be optimized less. That is the discussion at rust-lang/unsafe-code-guidelines#133.

@scottmcm
Copy link
Member

It's relatively common for people to say "oh, you can use Box<str>/Box<[T]> to save a bit of space over String/Vec<T> if you don't need to resize it", so I think it'd be particularly surprising for them to have drastically-different rules for getting the unsafe code correct.

So I don't know what the best solution would be, but I do think they should be consistent.

@ChrisDenton ChrisDenton added A-collections Area: `std::collection` T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests