Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type" #53491

Closed
3 tasks done
Centril opened this issue Aug 19, 2018 · 384 comments
Closed
3 tasks done
Labels
B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Centril
Copy link
Contributor

Centril commented Aug 19, 2018

NEW TRACKING ISSUE = #63566

This is a tracking issue for the RFC "Deprecate uninitialized in favor of a new MaybeUninit type" (rust-lang/rfcs#1892).

Steps:

Unresolved questions:

  • Should we have a safe setter that returns an &mut T?
  • Should we rename MaybeUninit? (Rename MaybeUninit to MaybeUninitialized #56138)
  • Should we rename into_inner? Should it be more like take instead and take &mut self?
  • Should MaybeUninit<T> be Copy for T: Copy?
  • Should we allow calling get_ref and get_mut (but not reading from the returned references) before data got initialized? (AKA: "Are references to uninitialized data insta-UB, or only UB when being read from?") Should we rename it similar to into_inner?
  • Can we make into_inner (or whatever it ends up being called) panic when T is uninhabited, like mem::uninitialized does currently? (done)
  • Seems like we want to not deprecate mem::zeroed. We should however remember to also update its documentation together with MaybeUninit, make sure people are aware that this is insta-UB if all-0-bits does not satisfy the type's validity invariant.
@Centril Centril added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Aug 19, 2018
@eddyb
Copy link
Member

eddyb commented Aug 19, 2018

cc @RalfJung

@japaric
Copy link
Member

japaric commented Aug 19, 2018

[ ] Implement the RFC

I can help implement the RFC.

@RalfJung
Copy link
Member

Awesome, I can help reviewing :)

@japaric
Copy link
Member

japaric commented Aug 19, 2018

I'd like some clarification on this part of the RFC:

Make calling uninitialized on an empty type trigger a runtime panic which also prints the deprecation message.

Should only mem::uninitialized::<!>() panic? Or should this also cover structs (and maybe enums?) that contain the empty type (e.g. (!, u8))?

@RalfJung
Copy link
Member

AFAIK we only do the really harmful code generation for !. Most other uses of mem::uninitialized are just as incorrect, but the compiler does not happen to exploit them.

So I'd do it for ! only, but also for mem::zeroed. (I forgot to amend that part when I added zeroed to the RFC, it seems.)

@eddyb
Copy link
Member

eddyb commented Aug 19, 2018

We could start off by making this:

"init" => {
let ty = substs.type_at(0);
if !cx.layout_of(ty).is_zst() {
// Just zero out the stack slot.
// If we store a zero constant, LLVM will drown in vreg allocation for large data
// structures, and the generated code will be awful. (A telltale sign of this is
// large quantities of `mov [byte ptr foo],0` in the generated code.)
memset_intrinsic(bx, false, ty, llresult, C_u8(cx, 0), C_usize(cx, 1));
}
return;
}
// Effectively no-ops
"uninit" => {
return;
}

check whether fn_ty.ret.layout.abi is Abi::Uninhabited and at the very least emit a trap, e.g.:

// Allow RalfJ to sleep soundly knowing that even refactorings that remove
// the above error (or silence it under some conditions) will not cause UB
let fnname = bx.cx.get_intrinsic(&("llvm.trap"));
bx.call(fnname, &[], None);

Once you've seen the trap (i.e. intrinsics::abort) in action, you can see if there's any nice way of triggering a panic. It'' be tricky because of unwinding, we'll need to special-case them here:

let intrinsic = intrinsic.as_ref().map(|s| &s[..]);
if intrinsic == Some("transmute") {

To actually panic, you'd need something like this:

// Get the location information.
let loc = bx.sess().codemap().lookup_char_pos(span.lo());
let filename = Symbol::intern(&loc.file.name.to_string()).as_str();
let filename = C_str_slice(bx.cx, filename);
let line = C_u32(bx.cx, loc.line as u32);
let col = C_u32(bx.cx, loc.col.to_usize() as u32 + 1);
let align = tcx.data_layout.aggregate_align
.max(tcx.data_layout.i32_align)
.max(tcx.data_layout.pointer_align);
// Put together the arguments to the panic entry point.
let (lang_item, args) = match *msg {
EvalErrorKind::BoundsCheck { ref len, ref index } => {
let len = self.codegen_operand(&mut bx, len).immediate();
let index = self.codegen_operand(&mut bx, index).immediate();
let file_line_col = C_struct(bx.cx, &[filename, line, col], false);
let file_line_col = consts::addr_of(bx.cx,
file_line_col,
align,
Some("panic_bounds_check_loc"));
(lang_items::PanicBoundsCheckFnLangItem,
vec![file_line_col, index, len])
}
_ => {
let str = msg.description();
let msg_str = Symbol::intern(str).as_str();
let msg_str = C_str_slice(bx.cx, msg_str);
let msg_file_line_col = C_struct(bx.cx,
&[msg_str, filename, line, col],
false);
let msg_file_line_col = consts::addr_of(bx.cx,
msg_file_line_col,
align,
Some("panic_loc"));
(lang_items::PanicFnLangItem,
vec![msg_file_line_col])
}
};
// Obtain the panic entry point.
let def_id = common::langcall(bx.tcx(), Some(span), "", lang_item);
let instance = ty::Instance::mono(bx.tcx(), def_id);
let fn_ty = FnType::of_instance(bx.cx, &instance);
let llfn = callee::get_fn(bx.cx, instance);
// Codegen the actual panic invoke/call.
do_call(self, bx, fn_ty, llfn, &args, None, cleanup);

(you can ignore the EvalErrorKind::BoundsCheck arm)

@japaric
Copy link
Member

japaric commented Aug 19, 2018

@eddyb Thanks for the pointers.


I'm now fixing (several) deprecation warnings and I feel (very) tempted to just run sed -i s/mem::uninitialized()/mem::MaybeUninit::uninitialized().into_inner()/g but I guess that would miss the point ... Or is that OK if I know that the value is a concrete (Copy) type? e.g. let x: [u8; 1024] = mem::uninitialized();.

@RalfJung
Copy link
Member

That would exactly miss the point, yeah.^^

At least for now, I would like to consider mem::MaybeUninit::uninitialized().into_inner() UB for all non-union types. Notice that Copy is certainly not sufficient; both bool and &'static i32 are Copy and your snippet is intended to be insta-UB for them. We may want an exception for "types where all bit patterns are okay" (integer types, essentially), but I would be opposed to making such an exception because undef is not a normal bit pattern. That's why the RFC says you need to fully initialize before calling into_inner.

It also says that for get_mut, but the RFC discussion brought up desired by some folks to relax the restriction here. That's an option I could live with. But not for into_inner.

I'm afraid all these uses of uninitialized will have to be more carefully reviewed, and in fact this was one of the intents of the RFC. We'd like the wider ecosystem to be more careful here, if everyone just uses into_inner immediately then the RFC was worthless.

@Centril
Copy link
Contributor Author

Centril commented Aug 19, 2018

We'd like the wider ecosystem to be more careful here, if everyone just uses into_inner immediately then the RFC was worthless.

This gives me an idea... perhaps we should lint (group: "correctness") for this sort of code? cc @oli-obk

@SimonSapin
Copy link
Contributor

I'm now fixing (several) deprecation warnings

We should only ship Nightly with those warnings once the recommended replacement is available at least on Stable. See similar discussion at #52994 (comment)

@cramertj
Copy link
Member

@RalfJung

We may want an exception for "types where all bit patterns are okay" (integer types, essentially)

You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).

I would be opposed to making such an exception because undef is not a normal bit pattern.

Yeah, this is an aspect in which mem::zeroed() is significantly different from mem::uninitialized().

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 20, 2018

@cramertj

You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).

Those discussions were about ways of allowing safe memcpys across types, but I think that's pretty much orthogonal to whether the memory being copied is initialized or not - if you put uninitialized memory in, you get uninitialized memory out.

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

IIRC, nobody there suggested or discussed any approach in which you could put uninitialized memory in and get initialized memory out, so I don't follow what those discussions have to do with this one. To me they are completely orthogonal.

@joshlf
Copy link
Contributor

joshlf commented Aug 21, 2018

To drive the point home a bit more, LLVM defines uninitialized data as Poison, which is distinct from "some arbitrary but valid bit pattern." Branching based on a Poison value or using it to compute an address which is then dereferenced is UB. So, unfortunately, "types where all bit patterns are okay" are still not safe to construct because using them without separately initializing them will be UB.

@cramertj
Copy link
Member

Right, sorry, I should have clarified what I meant. I was trying to say that "types where all bit patterns are okay" is already something that we're interested in defining for other reasons. Like @RalfJung said above,

I would be opposed to making such an exception because undef is not a normal bit pattern.

@joshlf
Copy link
Contributor

joshlf commented Aug 21, 2018

Thank god there are people who can read, because apparently I can't...

@RalfJung
Copy link
Member

Right, so what I meant to say is: We definitely have types where all initialized bit patterns are okay -- all the i* and u* types, raw pointers, I think f* as well and then tuples/structs only consisting of such types.

What is an open question is under which circumstances which of these types are allowed to be uninitialized, i.e., poison. My own preferred answer is "never".

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

Reading padding bytes as MaybeUninit<u8> should be fine.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 21, 2018

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

Reading padding bytes as MaybeUninit should be fine.

The discussion in a nutshell was about providing a trait, Compatible<T>, with a safe method fn safe_transmute(self) -> T that "reinterprets"/"memcpys" the bits of self into a T. The guarantee of this method is that if self is properly initialized, so is the resulting T. It was proposed for the compiler to fill in transitive implementations automatically, e.g., if there is an impl Compatible<V> for U, and an impl Compatible<W> for V then there is an impl Compatible<W> for U (either because it was provided manually, or the compiler auto generates it - how this could be implemented was completely handwaved).

It was proposed that it should be unsafe to implement the trait: if you implement it for a T that has padding bytes where Self has fields, then everything is fine at least until you try to use the T and your program behavior ends up depending on the contents of the uninitialized memory.

I have no idea what any of this has to do with MaybeUninit<u8>, maybe you could elaborate on that?

The only thing I can imagine is that we could add a blanket impl: unsafe impl<T> Compatible<[MaybeUninit<u8>; size_of::<T>()]> for T { ... } since transmuting any type into a [MaybeUninit<u8>; N] of its size is safe for all types. I don't know how useful such an impl would be, given that MaybeUninit is an union, and whoever uses the [MaybeUninit<u8>; N] has no idea of whether a particular element of the array is initialized or not.

@RalfJung
Copy link
Member

@gnzlbg back then you were talking about FromBits<T> for [u8]. That is where I say we have to use [MaybeUninit<u8>] instead.

@joshlf
Copy link
Contributor

joshlf commented Aug 21, 2018

I discussed this proposal with @nikomatsakis at RustConf, and he encouraged me to go forward with an RFC. I was going to do it in a few weeks, but if there's interest, I can try getting one done this weekend. Would that be useful for this discussion?

@RalfJung
Copy link
Member

@joshlf which proposal are you talking about?

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 21, 2018

@RalfJung

@gnzlbg back then you were talking about FromBits for [u8]. That is where I say we have to use [MaybeUninit] instead.

Gotcha, fully agree here. Had completely forgotten that we also wanted to do that 😆

@joshlf
Copy link
Contributor

joshlf commented Aug 21, 2018

@joshlf which proposal are you talking about?

A FromBits/IntoBits proposal. TLDR: T: FromBits<U> means that any bit pattern which is a valid U corresponds to a valid T. U: IntoBits<T> means the same thing. The compiler automatically infers both for all pairs of types given certain rules, and this unlocks lots of fun goodness that currently requires unsafe. There's a draft of this RFC here that I wrote a while back, but I intend to change large parts of it, so don't take that text as anything more than a rough guide.

@RalfJung
Copy link
Member

@joshlf I think such a pair of traits would more build on top of this discussion than be part of it. AFAIK we have two open questions in terms of validity:

  • Does it recurse below references? I more and more strongly think it should not, as we see more examples. So likely we should adapt the MaybeUninit::get_mut docs accordingly (it is not actually UB to use that before completing initialization, but it is UB to dereference it before completing initialization). However, we first have to make that decision for validity, and I am not sure what the right venue is for that. Probably a dedicated RFC?
  • Does a u8 (and other integer types, floating point, raw pointer) have to be initialized, i.e., is MaybeUinit<u8>::uninitialized().into_inner() insta-UB? I think so, but mostly based on a gut feeling that we want to keep the places where we allow poison/undef to a minimum. However, I could be persuaded otherwise if there are plenty of uses of this pattern (and I hope to use miri to help determining this).

@RalfJung
Copy link
Member

@SimonSapin see the unstable uninitialized_array! macro in libcore.

@Stargateur
Copy link
Contributor

@RalfJung maybe uninit_array! would be a better name.

@RalfJung
Copy link
Member

RalfJung commented Jul 18, 2019

@Stargateur Absolutely, this is definitely not going to be stabilized with its current name. It is hopefully not going to get stabilized ever if #49147 happens Soon (TM).

@eddyb
Copy link
Member

eddyb commented Jul 18, 2019

@RalfJung Ugh, that's my fault, I was blocking the PR without a great reason: #61749 (comment)

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member

Mystery solved: the uses of the repeat expression in libcore actually were for types that are copy.

And the reason it doesn't work in liballoc is that MaybeUninit::uninit is not promotable.

@eddyb
Copy link
Member

eddyb commented Jul 19, 2019

@RalfJung Maybe open a PR removing uses of the macro where it's completely unnecessary?

@RalfJung
Copy link
Member

RalfJung commented Jul 19, 2019

@eddyb I made that part of #62799.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Jul 21, 2019

Regarding maybe_uninit_ref

For as_ref/as_mut, I honestly wanted to wait until we know if references have to point to initialized data. Otherwise the documentation for those methods is just so preliminary.

Unstable get_ref / get_mut are definitely adviseable because of that; however, there are cases where get_ref / get_mut may be used when the MaybeUninit has been init: to get a safe handle to the (now known initialised) data while avoiding any memcpy (instead of assume_init, which may trigger a memcpy).

  • this may seem like a particularly specific situation, but the main reason people (want to) use uninitialised data is precisely for this kind of cheap savings.

Because of this, I imagine assume_init_by_ref / assume_init_by_mut could be nice to have (since into_inner has been called assume_init, I seems plausible that the ref / ref mut getters also get a special name to reflect this).

There are two / three options for this, related to Drop interaction:

  1. Exact same API as get_ref and get_mut, which can lead to memory leaks when there is drop glue;

    • (Variant): same API as get_ref / get_mut, but with a Copy bound;
  2. Closure style API, to guarantee drop:

impl<T> MaybeUninit<T> {
    /// # Safety
    ///
    ///   - the contents must have been initialised
    unsafe
    fn assume_init_with_mut<R, F> (mut self: MaybeUninit<T>, f: F) -> R
    where
        F : FnOnce(&mut T) -> R,
    {
        if mem::needs_drop::<T>().not() {
            return f(unsafe { self.get_mut() });
        }
        let mut this = ::scopeguard::guard(self, |mut this| {
            ptr::drop_in_place(this.as_mut_ptr());
        });
        f(unsafe { MaybeUninit::<T>::get_mut(&mut *this) })
    }
}

(Where scopeguard's logic can easily be reimplemented, so there is no need to depend on it)


These could be stabilized faster than get_ref / get_mut, given the explicit assume_init requirement.

Drawbacks

If a variant of option .1 was chosen, and get_ref / get_mut were to become usable without the assume_init situation, then this API would become almost strictly inferior (I say almost because with the proposed API, reading from the reference would be okay, which it can never be in the case of get_ref and get_mut)

@RalfJung
Copy link
Member

Similar to what @danielhenrymantilla wrote about get_{ref,mut}, I am beginning to think that read should probably be renamed to read_init or read_assume_init or so, something that indicates that this may only be done after initialization is complete.

@Pzixel
Copy link

Pzixel commented Aug 7, 2019

@RalfJung I have a question about this:

fn foo<T>() -> T {
    let newt = unsafe { MaybeUninit::<T>::zeroed().assume_init() };
    newt
}

For example, we call foo<NonZeroU32>. Does this trigger UB when we declare a foo function (because it have to be valid for all Ts or when we instantinate it with a type which triggers UB? Sorry if it's a wrong place to ask a question.

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2019

@Pzixel code can only cause UB when it runs.

So, foo::<i32>() is fine. But foo::<NonZeroU32>() is UB.

The property of being valid for all possible ways to call is called "soundness", also see the reference. The general contract in Rust is that the safe API surface of a library must be sound. This so that users of a library do not have to worry about UB. The entire safety story of Rust rests on libraries with sound APIs.

@Pzixel
Copy link

Pzixel commented Aug 7, 2019

@RalfJung thanks.

So if I get you correctly this function is unsound (and thus, invalid), but if we mark it as unsafe then this body becomes valid and sound

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2019

@Pzixel if you mark it unsafe, soundness is just not a concept that even applies any more. "Is this sound" only makes sense as a question for safe code.

Yes, you should mark the function unsafe because some inputs can trigger UB. But even if you do, those inputs still trigger UB, so the function still must not be called that way. It is never okay to trigger UB, not even in unsafe code.

@Pzixel
Copy link

Pzixel commented Aug 7, 2019

Yes, of course, I understand that. I only wanted to conclude that partial functuon must be marked as unsafe. Makes sense to me but I didn't think about it before you replied.

@abonander
Copy link
Contributor

Since the discussion on this tracking issue is so long now, can we break it out into a few other tracking issues for each feature of MaybeUninit that's still unstable?

  • maybe_uninit_extra
  • maybe_uninit_ref
  • maybe_uninit_slice

@RalfJung
Copy link
Member

Seems reasonable. There also is #63291.

@Centril
Copy link
Contributor Author

Centril commented Aug 14, 2019

Closing this in favor of a meta-issue which tracks MaybeUninit<T> more generally: #63566

@Pzixel
Copy link

Pzixel commented Apr 17, 2021

@RalfJung one more question please

Yes, you should mark the function unsafe because some inputs can trigger UB. But even if you do, those inputs still trigger UB, so the function still must not be called that way. It is never okay to trigger UB, not even in unsafe code.

Is it okay to not mark function as unsafe if it can trigger UB but it's not public. For example, consider Vec::set_len be private. Is it ok to not mark it as unsafe in this case or not? Since when I read nomicon it says it's perfectly fine:

However this works perfectly. The existence of make_room is not a problem for the soundness of Vec because we didn't mark it as public. Only the module that defines this function can call it. Also, make_room directly accesses the private fields of Vec, so it can only be written in the same module as Vec.

@RalfJung
Copy link
Member

@Pzixel I don't think there is community consensus on this matter. I personally would not accept such code if I had to review it -- safety should IMO be meaningful even inside a library implementation. I think it makes unsafe modules easier to write if you consistently follow this discipline.

But "soundness" as a concept is defined in terms of clients using the public API of your type, so for soundness, the safety of private functions does not matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests