causes UB by calling mem::zeroed on any type #2

RalfJung · 2019-01-23T12:26:10Z

This code is highly problematic:

    pub unsafe fn new<F: for<'a> FnOnce(&'a T) -> &'a U>(f: F) -> Self {
        // Construct a "fake" T. It's not valid, but the lambda shouldn't
        // actually access it (which is why this is unsafe)
        let x = mem::zeroed();

If T is, for example, a struct containing reference type, then this causes UB by initializing a reference with 0. The comment seems to say that it is okay to do this as long as x is not accessed; that is not correct.

That said, there currently is no correct way to do what you want. This bug can only really be fixed once MaybeUninit gets stabilized.

(I also feel really uneasy with the extend to which this relies on type inference, but YMMV.^^)

The text was updated successfully, but these errors were encountered:

Diggsey · 2019-01-23T12:58:56Z

Well... my excuse is that the rust unsafe code guidelines didn't exist when the crate was made, and this would have been valid in C/C++ 😛 So uh.. thanks!

Does MaybeUninit actually help? The lambda still needs to access (the address of) a field on the uninitialised value in the union.

Does the UB come from owning the uninitialised value, or does it come from having a live expression of type T, where T may be uninhabited, or something else?

It seems like an intrinsic maybe needed or there will still be no way to do this without invoking UB?

RalfJung · 2019-01-23T14:49:11Z

You might be interested in this discussion on internals that brought me here. :)

Does the UB come from owning the uninitialised value, or does it come from having a live expression of type T, where T may be uninhabited, or something else?

UB comes from operating on (in this case, assigning) a value of a type T that does not satisfy the validity invariant of that type.

Does MaybeUninit actually help? The lambda still needs to access (the address of) a field on the uninitialised value in the union.

You are fine as long as you avoid references. This is where you also need rust-lang/rfcs#2582.

Diggsey · 2019-01-23T15:43:37Z

@RalfJung I see, the only bit that seems kindof a grey area for me is the fields access, eg.

&pointer_to_uninit.field

Coming from C++, my intuition for how lvalues work is that they are essentially the same as implicit references. eg.

auto x = *ptr + 1;

This is actually taking the rvalue ptr, the * operator converts it into an lvalue with the address of ptr, and then it's only when that lvalue is implicitly converted to an rvalue in order to perform the addition, that it actually gets dereferenced.

And this explains why the following does not actually read the memory referred to by ptr:

auto y = &(*ptr).foo;

Again ptr is converted to an lvalue with the address of ptr, then it does a field access (which for lvalues is the same as adding an offset to their address) and then we use &, which (for lvalues) is a no-op that converts them back into a normal pointer. Because the lvalue is never implicitly converted to an rvalue, the memory is never actually dereferenced, even though the dereference operator is used.

In this model, the rules for uninit data are easy: you are just not allowed to have uninit data in an rvalue at any time.

Now, going back to rust-land: if I try to find corresponding rules, it's much trickier, because now it matters whether an lvalue was derived from a reference vs a raw pointer or union access: let's call it raw_lvalue vs lvalue.

Now the rule must be that neither lvalues nor rvalues can contain uninit data, but raw_lvalues can (or else the field offset in &raw uninit.value.field would be invalid.

Without these rules, it's difficult to specify what kinds of operations are allowed to operate on uninit data: eg. you say that assignment cannot operate on uninit data, and you can derive that by saying that assignment operates on lvalues (and so raw_lvalues must first be converted to lvalues) whereas field access is defined for all three of raw_lvalues, lvalues and rvalues.

RalfJung · 2019-01-23T19:08:29Z

This is actually taking the rvalue ptr, the * operator converts it into an lvalue with the address of ptr, and then it's only when that lvalue is implicitly converted to an rvalue in order to perform the addition, that it actually gets dereferenced.

This matches Rust.

if I try to find corresponding rules, it's much trickier, because now it matters whether an lvalue was derived from a reference vs a raw pointer or union access: let's call it raw_lvalue vs lvalue.

Why do you think unions matter? I don't think they do. Actually I think the model should be uniform for all places (that's how lvalues are called in Rust), and the only thing it should assert is that the memory this points to is allocated (with the size given by the type). But so far that is just my model, not something we all agreed on yet.

Only rvalues (for which we are still searching a better term in Rust land) have to be valid at their given type.

Diggsey · 2019-01-23T21:31:35Z

OK, I think I understand. If I create a (normal) reference to an uninit lvalue:

let x = &maybe_uninit.value;

The uninitialised value is never in an rvalue, and so it's fine for it to be uninitialised. However, a reference to it is created which is an rvalue, and references must point to valid data to themselves be valid, and that's the source of the UB.

If I could somehow create a reference such that the reference itself was an lvalue, then that reference could point to uninitialised data, without invoking UB, but that's not possible because & always returns an rvalue.

As for naming rvalues, I think it would be tricky to improve on rvalue given how obscure the concept is, because there's never a good name for obscure concepts, and sometimes just using any existing precedent is better. Maybe "materialized", "actualized" or "reified" value.

RalfJung · 2019-01-23T22:07:18Z

However, a reference to it is created which is an rvalue, and references must point to valid data to themselves be valid, and that's the source of the UB.

Exactly. Except that what exactly validity of references is is not yet finalized, we are collecting data and going to be discussing that in rust-lang/unsafe-code-guidelines#77.

In my reading references have to point to allocated memory but it is okay if that memory is not initialized. But there might be reasons to require the stronger invariant.

If I could somehow create a reference such that the reference itself was an lvalue, then that reference could point to uninitialised data, without invoking UB, but that's not possible because & always returns an rvalue.

Not sure what you mean here, but the point of rust-lang/rfcs#2582 is that you can turn a place immediately into an rvalue of type *mut T, and that does not have any requirements about memory being initialized.

RalfJung · 2019-08-10T09:03:26Z

Reading through this again, I don't know why we talked about field access here. The UB is already in this line:

        // Construct a "fake" T. It's not valid, but the lambda shouldn't
        // actually access it (which is why this is unsafe)
        let x = mem::zeroed();

This "produces" a value from 0-filled memory for any type T, which has been documented as UB in the Nomicon for quite a while.

Do you think there is any chance this could use the memoffset crate? We spent quite some time getting the offset_of! in that crate to be as safe as it can be for the versions of Rust it supports (which is Rust 1.25+).

Diggsey · 2019-08-10T13:01:59Z

I don't believe that version of the offset_of! macro is sufficiently safe for what this crate requires. The way it gets the address of a field may invoke Deref and result in an invalid field offset.

The way I get the address in this crate is by using pattern matching, so that offset_of! can only return valid field offsets, which can then be encapsulated in the safe FieldOffset type.

Unfortunately, there is no equivalent of ref for raw pointers when pattern matching.

RalfJung · 2019-08-10T13:07:31Z

I don't believe that version of the offset_of! macro is sufficiently safe for what this crate requires. The way it gets the address of a field may invoke Deref and result in an invalid field offset.

No, it specifically checks for that case first, before doing the field access. We went for that approach to make it easier to convert to rust-lang/rfcs#2582 once that lands, and to keep the references to uninitialized memory that we create to a minimum.

RalfJung · 2019-08-26T07:50:48Z

@Diggsey did you have a chance to take a second look? I am fairly sure that your analysis above is incorrect and that memoffset successfully protects from Deref.

The use of mem::zeroed here means that code will likely cause SIGILL when this is used with a type that must not be null. Unlike the currently unavoidable UB that is still left in memoffset, the approach taken here is known to cause "miscompilations" (as in, things not being compiled the way the user expects -- there is no compiler bug). In particular, this crate will trigger the new lint warning about invalid values.

Diggsey · 2019-08-26T14:45:30Z

Yes, I don't have any concerns with the new implementation now, but was considering whether I should make this a breaking change:

The existing (unsafe) FieldOffset::new method cannot be used safely with all types as it uses references rather than pointers.

The current documentation says that when using this method, you should not access the referenced value, but this is no longer sufficient: if I keep the method as-is, it will be invalid to call on any type which cannot be zeroed, so I would need to update the docs and add a second constructor without that constraint. However, it looks like in that case the new lint will still fire? (Shouldn't the lint be disabled for generic parameters if the method is unsafe, because it could simply be part of the contract that T must be zero-able?)

Anyway, I'm leaning towards just making it a breaking change to avoid any surprises, so this shouldn't be an issue.

Diggsey · 2019-08-26T15:30:25Z

I've opened a PR: #3

Would you mind reviewing it?

RalfJung · 2019-08-26T15:36:05Z

The lint only fires if the type is definitely not zero-able... so yeah I guess it will not apply here. I thought that code was in a macro, but it is not.

Anyway, I'm leaning towards just making it a breaking change to avoid any surprises, so this shouldn't be an issue.

Ah, I didn't realize this would have to be a breaking change.

Given the unsoundness of the old version, I wonder if it is worth either having a patch release that's still not sound but better (it could use a dangling reference instead of zeroed, similar to what memoffset does for pre-MaybeUninit versions of rustc), or else yanking the old version?

Would you mind reviewing it?

Thanks! I'll take a look.

RalfJung · 2019-08-26T16:09:12Z

Not sure why I need owner access for the review but thanks anyway. ;)
I will not have time for helping out with maintenance though, I am afraid.

Diggsey · 2019-08-26T21:53:12Z

Github wouldn't let me add you as a reviewer without first adding you as a contributor. Don't worry I'm not expecting you to help maintain 😛

Diggsey · 2019-08-26T22:25:24Z

Btw, I did find this line to be a little... dangerous: https://github.com/Gilnaa/memoffset/blob/master/src/offset_of.rs#L33
(The fact that it relies on expanding into the same scope)

Maybe it's just personal taste, but I prefer to put as much of the unsafety outside of macros as possible, because in a macro, you might rely on something which today would be a syntax error, but in tomorrow's rust might be valid code and do something totally unexpected.

RalfJung · 2019-08-27T07:38:15Z

I can't really imagine anything that would break that, do you have an example?

Also this helps factor the code, which is important when dealing with unsafe. I'd rather not repeat myself.

RalfJung mentioned this issue Jan 23, 2019

Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type" rust-lang/rust#53491

Closed

3 tasks

RalfJung changed the title ~~calls mem::zeroed on any type~~ causes UB by calling mem::zeroed on any type Jul 7, 2019

Diggsey mentioned this issue Aug 26, 2019

Fix undefined behaviour. #3

Merged

Diggsey closed this as completed in #3 Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

causes UB by calling mem::zeroed on any type #2

causes UB by calling mem::zeroed on any type #2

RalfJung commented Jan 23, 2019

Diggsey commented Jan 23, 2019

RalfJung commented Jan 23, 2019

Diggsey commented Jan 23, 2019

RalfJung commented Jan 23, 2019

Diggsey commented Jan 23, 2019 •

edited

Loading

RalfJung commented Jan 23, 2019 •

edited

Loading

RalfJung commented Aug 10, 2019

Diggsey commented Aug 10, 2019

RalfJung commented Aug 10, 2019 •

edited

Loading

RalfJung commented Aug 26, 2019 •

edited

Loading

Diggsey commented Aug 26, 2019

Diggsey commented Aug 26, 2019

RalfJung commented Aug 26, 2019 •

edited

Loading

RalfJung commented Aug 26, 2019

Diggsey commented Aug 26, 2019

Diggsey commented Aug 26, 2019

RalfJung commented Aug 27, 2019

causes UB by calling mem::zeroed on any type #2

causes UB by calling mem::zeroed on any type #2

Comments

RalfJung commented Jan 23, 2019

Diggsey commented Jan 23, 2019

RalfJung commented Jan 23, 2019

Diggsey commented Jan 23, 2019

RalfJung commented Jan 23, 2019

Diggsey commented Jan 23, 2019 • edited Loading

RalfJung commented Jan 23, 2019 • edited Loading

RalfJung commented Aug 10, 2019

Diggsey commented Aug 10, 2019

RalfJung commented Aug 10, 2019 • edited Loading

RalfJung commented Aug 26, 2019 • edited Loading

Diggsey commented Aug 26, 2019

Diggsey commented Aug 26, 2019

RalfJung commented Aug 26, 2019 • edited Loading

RalfJung commented Aug 26, 2019

Diggsey commented Aug 26, 2019

Diggsey commented Aug 26, 2019

RalfJung commented Aug 27, 2019

Diggsey commented Jan 23, 2019 •

edited

Loading

RalfJung commented Jan 23, 2019 •

edited

Loading

RalfJung commented Aug 10, 2019 •

edited

Loading

RalfJung commented Aug 26, 2019 •

edited

Loading

RalfJung commented Aug 26, 2019 •

edited

Loading