-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stacked Borrows: asserting uniqueness too early? Should we allow the optimizer to add spurious stores? #133
Comments
Another related case: rust-lang/rust#62553; calling |
This is my opinion based on experience with unsafe code and FFI: While mutable references should assert that the data is otherwise immutable, I believe that the assertion of uniqueness should only apply to references. It is far too common of an assumption that raw pointers are not invalidated by a mutable reference existing, and that it is sufficient to merely hold off on reading or writing through raw pointers until the mutable reference ceases to exist. At the moment you dereference a raw pointer immutably no other mutable references should exist, and at the moment you dereference a raw pointer mutably no other references should exist at all, but a raw pointer should not be invalidated by mutable references. |
It does only apply to references. if you have a reference and a raw pointer working on the same thing, the reference is not unique. It's not the raw pointer that is causing issues here, it is the reference.
It would help if you could give a concrete self-contained example of something that is UB according to current Stacked Borrows, but which you think should be allowed. Is it something like this: fn main() {
let mut local = 0;
let raw = &mut local as *mut i32;
let mut_ref = &mut local;
*mut_ref = 13;
let _val = unsafe { *raw }; // UB (as of Stacked Borrows 2.1).
} |
fn main() {
let mut x = 0;
let y = &mut x as *mut _;
{ let _z = &mut x; }
unsafe { *y = 10; }
} When run through miri:
I was led to believe that this error is by design due to stacked borrows. If it is just an issue with miri that will get fixed and this code example is actually fine then I would be so happy. |
@retep998 Right, so that is an example involving an unused mutable references. This is definitely forbidden by the current Stacked Borrows (it is not "just" a Miri implementation issue), but I could imagine something like "Tree-shaped borrows" that would open a design space to allow more things here. However, it is unclear how much that would cost in terms of lost optimizations. This issue is about collecting cases where unused mutable references caused UB. ("used"/"unused" is a fuzzy term here, but if the variable is dead immediately after it got created, like in your last example, that's definitely "unused".) What I am wondering about is how you feel about used mutable references interacting with raw pointers, like in my example above. |
In your example above the used mutable reference still exists when the raw pointer is accessed, which is shaky ground. If we were to modify my example to actually use the mutable reference... fn main() {
let mut x = 0;
let y = &mut x as *mut _;
{
let z = &mut x;
*z = 20;
}
unsafe { *y = 10; }
} then I would still want it to be okay and not UB. Raw pointers should not care if something else mutated the value in the time between accesses. |
So if I understand correctly, miri invalidates other children of x when &mut x is created. Is it possible to simply hold the raw children in abeyance, like one does with x itself? It is a bit surprising that creating z from y is ok, but creating z from x (which we know is the same as y) is not. |
@retep998 By itself your example is pretty harmless, but it's also useless in that nothing uses the stored value of fn main() {
let mut x = 0;
let y = &mut x as *mut _;
{
let z = &mut x;
*z = 20;
}
unsafe { *y = 10; }
println!("{}", x);
} Then we have a problem, because what if the raw pointer stuff is hidden behind function calls? fn foo(x: &mut i32){
some_global = x as *mut _;
}
fn bar() {
unsafe { *some_global = 10; }
}
fn main() {
let mut x = 0;
foo(&mut x);
{
let z = &mut x;
*z = 20;
}
bar();
println!("{}", x);
} (Note that in reality Under Stacked Borrows, the compiler can optimize In other words, from an implementation perspective, the issue is not how the compiler treats raw pointers, but what assumptions the compiler can make about unknown code. That said, I agree that this is a footgun. I really wish we had something with miri's level of undefined behavior detection that also worked with FFI, at least to some extent. |
I don't think we should design things under the assumption that something like this can exist. |
i think that means that the foo/bar example has to be blindly assumed to always be the sort of case if you pass an address to unknown code, because some code does work that way. |
While the aliasing model determines which optimizations Rust is allowed to do locally (without whole program analysis), it also determines what assumptions users can make about the behavior of unknown code. Can users reason about the code locally? E.g. if I modify this piece of code, I know I can do this and that, because these unknown functions are not allowed to do x and y without introducing UB. Or do users need to keep the whole program in their head to be able to reason about UB and about how to modify code without introducing UB ? |
Now that is interesting. Stacked Borrows was specifically designed to not care about such cases of "existence" because I was sure that would result in a lot of backlash. ;) So I didn't expect some far on the "max freedom for unsafe code" end of the spectrum to consider this a reasonable restriction. OTOH, references passed as function argument do have special treatment that keeps them "live" throughout the call (and the borrow checker does assume they live for the entire function). (Btw let me say that I enjoy this exchange a lot, this is the kind of discussion I was hoping to have when I put out my proposals.)
This is where it gets interesting. I consider this equivalent to my example as the only difference is in the scoping of some variables without drop glue. Thankfully @comex already came up with a great example for why Stacked Borrows makes this example UB -- an example for what kind of optimizations this UB enables. In my view, the entire point of Stacked Borrows is to enable intraprocedural optimization based on alias information, so losing some of those is a big deal. For the current version of Stacked Borrows, we have formal machine-checked proofs that some optimizations are possible (I'll share this in a blog post eventually). It does not seem obvious that even those optimizations are still possible in a model that permits as many programs as you are asking. But it might be possible. I will have to think about this more.
As stated above, the raw pointer does not care. It's the mutable reference that cares enough to make other pointers invalid, in order to enforce its own discipline. @retep998 while I have your attention, can I probe your intuition another way? What if we change your example to use fn main() {
let x = &mut 0; // changed x to also be a reference
let y = &mut *x as *mut _;
*x = 20; // instead of `{ let z = &mut *x; *z = 20; }`
unsafe { *y = 10; }
println!("{}", *x); // Does removing this change anything?
} This program I feel very strongly should have UB, because between two uses of But OTOH, my intuition also says it makes sense to consider So, I am wondering what your thoughts on this are.
Maybe. Stacked Borrows is but one point in a gigantic design space. But "simply" doesn't exactly describe the situation. ;)
However, here I think your intuition needs some refinement. This is somewhat unintuitive, but even C does this. Here is some literature: [1], [2], [3].
I think it could exist. It's just hard.
I am afraid I don't understand your question. Do you have an example for "this and that"? |
It wasn't a question, more like a criticism that the discussion often focus on which optimizations the aliasing model allows, which programs it bans, how it makes the life of miri/etc. easier, without actually covering how does the aliasing model make it easier for users to reason about their code. I think it should focus more on the latter. For example, in @comex second example here, if Arguing that making X UB allows users to not have to inspect the source code of your whole program when modifying some function is a stronger argument for most users IMO than saying that it allows some compiler optimization. |
I agree. People are willing to set a lot of optimization on fire if it makes the universe easier to understand. See Also: all GC languages. The first evaluation of an aliasing model is if it lets a low to average skill programmer follow along with the implications of what they're doing. There are many other metrics too, but that needs to be the first target. Which is not to say that Stacked Borrows, or any other model, is necessarily hard to follow, I just want to emphasize the importance of gnzlbg's point. |
Minor nitpick that is nevertheless relevant to the "(human) reasoning about code" is that @comex's
I think this equivalence is needed for reborrowing to make sense: one can always imagine that I am on the side of a strict language with enabled optimizations, and an escape hatch for "raw stuff". In this case, Rust's escape hatch is
|
@gnzlbg I agree. A memory model has two consumers: programmers that code against it, and compiler authors that implement it and optimize under it.
Interesting. That is not at all my experience from most discussions around UB, and I have gotten explicit pushback against arguing by "simplicity", but maybe that's not what you mean. The argument is almost the same though: allowing compiler optimizations is about making program analyses easier, and that also makes it easier for humans to analyze. Though I suppose something like TBAA is rarely useful for a human analyzer, while Stacked Borrows provides some entirely intraprocedural reasoning principles that humans can learn just like compilers can. If I had infinite time, I'd write a blog post or two about those reasoning principles, but in lieu of that I can only point at this section of the Stacked Borrows 1.0 post.
Not the kind of people that make languages like C and C++ though. Source: have you looked at those languages?^^ And people use them because they are fast, which they are because of all those complications.
Not really, those languages also statically prohibit many of the things you have to worry about in C/C++. I mean, safe Rust also is a no-brainer when it comes to Stacked Borrows. |
FWIW, everything going on in these Stacked Borrows models and all the UCG discussions is several lightyears ahead of C/C++ in terms of "simplicity" (which for me means roughly "comprehensibility to non-wizards"). I've long since given up having any idea what C/C++ think UB really is, but for the stuff under discussion here, I'm fairly sure that the only reason I can't work through most of the examples myself is simply that all of this stuff is still in flux so I haven't tried to properly memorize the current model. Certainly the basic idea in the last several comments that "while there's a I also feel like the past discussions on this have typically been taking simplicity / the ability for users to reason about their code very seriously, to the point that pretty much every Stacked Borrows update either gives a very specific reason why a new complexity had to be introduced, or proclaimed some significant simplification of the model, or both. Though perhaps I'm over-focusing on Ralf's blog posts there. But maybe the point of bringing up simplicity/reasonability here was to imply something like "the complexity cost of introducing tree-shaped borrows is too high to be worth making retep's original case non-UB". In which case I tentatively agree; we'd need to see some big optimization wins to justify that. |
FWIW I retract my suggestion on the basis of @comex's interprocedural example. 👍 |
Another instance of "mutable references created but never used breaks stuff": thomcc/rust-typed-arena#27. |
I ran into this (or something similar) today when using https://github.com/Matthias247/futures-intrusive under miri. In particular, we have some data structure
where I was hoping that wrapping the ListNode in an UnsafeCell was enough, but it isn't - miri sees through the UnsafeCell. It seems to me that this destroys any possibility of making "encapsulated" intrusive data structures, since we can't stop the user from taking mutable reference to the structure containing the list node - even if that mutable reference will never be used to mutate the node in question. |
I am once again running Miri on a lot of crates. It seems that breaking this rule specifically with
let mut entry = Box::new(Entry {
next_in_bucket: self.buckets[bucket_index].take(),
hash,
ref_count: AtomicIsize::new(1),
string: string.into_boxed_str(),
});
let ptr = NonNull::from(&mut *entry);
self.buckets[bucket_index] = Some(entry); let node = /* snip */
} else {
// if the cache is not full allocate a new LruEntry
Box::new(LruEntry::new(k, v))
};
let node_ptr: *mut LruEntry<K, V> = &mut *node;
self.attach(node_ptr);
let keyref = unsafe { (*node_ptr).key.as_ptr() };
self.map.insert(KeyRef { k: keyref }, node); pub unsafe fn new_assert_stable_address(o: O) -> Self
where O: Deref<Target = T>,
{
OwningRef {
reference: &*o,
owner: o,
}
} |
Also, it's unclear to me if the aforementioned code is already UB in practice due to the way rustc applies LLVM's |
Such a pointer is already featured in the Related to it, and to |
Yes, it is. This code succeeds in debug mode but panics in release mode: #[inline(never)]
fn noalias_test(mut a: Box<i32>, b: *mut i32) -> i32 {
*a = 4;
unsafe { *b = 5; }
*a // <- LLVM assumes this must be 4
}
fn main() {
let mut a = Box::new(10);
let b = &mut *a as *mut i32;
assert_eq!(noalias_test(a, b), 5);
} I haven't heard of a checking interpreter for LLVM IR, but this kind of "write one thing, write another thing, read back the first thing" pattern is a good way to test aliasing assumptions in general. |
That's a slick demo, but I'm not totally sure it applies to the particular examples I linked. Particularly for a string interner or entries in an LRU cache, there shouldn't be any mutation. If there were mutation through the pointer I think that code might not apply to this issue in particular, because asserting uniqueness at mutation would just be UB through another route (I'd say later, but time travel). I can't think up with any way that a surprising optimization would appear in the absence of mutation, but the LangRef's section on |
The LangRef does call out mutation:
Emphasis added. Also,
So if there really is no mutation, then you're safe from LLVM's perspective; perhaps that's an indication that Stacked Borrows should be loosened in that case. That said: I was able to translate my example to use use std::cell::Cell;
use owning_ref::BoxRef;
#[inline(never)]
fn noalias_test(x: BoxRef<Cell<i32>>) -> i32 {
x.as_owner().set(4);
x.set(5);
x.as_owner().get()
}
fn main() {
let x: BoxRef<Cell<i32>> = BoxRef::new(Box::new(Cell::new(10)));
assert_eq!(noalias_test(x), 5);
} (Incidentally, As for That said, I suspect it would be difficult to actually trigger a miscompilation in either of those two crates. |
Big thanks for correcting me so thoroughly! I've run into a subtle case with interior mutability before, maybe sprinkling Cell around should just be part of looking into these findings. |
Fix all problems encounted with Miri -Ztag-raw-pointers I poked at this crate with `-Zmiri-tag-raw-pointers` before, and I was unable to fix what I found (I just added a test case that ruled out one of my wrong ideas #271). I tried again just now and I guess I just understand better this time. This PR fixes 3 separate pointer invalidation problems, which are detected by running `MIRIFLAGS=-Zmiri-tag-raw-pointers cargo miri test`. Depending on how you squint, 2 or 3 of these are rust-lang/unsafe-code-guidelines#133. The last one is _probably_ still present even with late invalidation, because `set_len` does a write through a `&mut`. It's unclear to me if any of these things that Miri complains about are potentially a miscompilation in rustc due to the use of LLVM `noalias`. But perhaps given how subtle this codebase is overall, it would be best to run the tools on their pickiest settings, even if there are a few things more like a false positive than a real problem.
Fix all problems encounted with Miri -Ztag-raw-pointers I poked at this crate with `-Zmiri-tag-raw-pointers` before, and I was unable to fix what I found (I just added a test case that ruled out one of my wrong ideas #271). I tried again just now and I guess I just understand better this time. This PR fixes 3 separate pointer invalidation problems, which are detected by running `MIRIFLAGS=-Zmiri-tag-raw-pointers cargo miri test`. Depending on how you squint, 2 or 3 of these are rust-lang/unsafe-code-guidelines#133. The last one is _probably_ still present even with late invalidation, because `set_len` does a write through a `&mut`. It's unclear to me if any of these things that Miri complains about are potentially a miscompilation in rustc due to the use of LLVM `noalias`. But perhaps given how subtle this codebase is overall, it would be best to run the tools on their pickiest settings, even if there are a few things more like a false positive than a real problem.
See rust-lang/unsafe-code-guidelines#133 Calling as_mut_ptr creates a unique reference, and as_ptr a shared reference. These 2 conflict, and one invalidates the other. Both ptr need to be reborrows of or be basically the same pointer.
See rust-lang/unsafe-code-guidelines#133 Calling as_mut_ptr creates a unique reference, and as_ptr a shared reference. These 2 conflict, and one invalidates the other. Both ptr need to be reborrows of or be basically the same pointer.
Visiting for UCG. Issue is mostly fine, but editing some |
FWIW, Tree Borrows does things different here -- mutable references are two-phase, and uniqueness is only asserted when they are written to. That seems to be a fairly elegant solution, though it also has its downsides. |
And of course now people are asking for optimizations that are only correct if we do aggressively assert uniqueness early. ;) |
We have seen a few cases in libstd where the rule that just creating a mutable reference already asserts uniqueness is a problem:
VecDeque
creating overlapping mutable references.BTreeMap
creating mutable references that overlap with shared references.LinkedList
creating overlapping mutable references.Vec::push
invalidating existing references into the vector. The typed-arena crate ran into the same problem.ptr::replace
are forbidden even though they should probably be allowed.In some cases, even just creating a shared ref is an issue, as it conflicts with a recently created mutable ref:
So maybe we assert uniqueness to aggressively here? Maybe mutable references should only become unique on first write, or so? I am not sure what that would look like though. (In principle this could also happen with shared references but I have not seen that.)
I'll use this issue to collect such cases.
However, there are also cases where people want the extra UB to get more optimizations:
The text was updated successfully, but these errors were encountered: