-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should &mut
-derived pointers be permanently "separate" from their siblings?
#450
Comments
I think this is what people naively expect. And if we had this property I think it would ease some of the scenarios where we fall back to advising that people not mix references and raw pointers; this is undoubtedly good advice for avoiding UB but it is unpleasant to give up references.
That last sentence sounds tantalizing, but I'll admit it is really just because I would like the
Yeah, this issue. Should we have a whole separate issue for this? I don't know how much this can be its own discussion; the aliasing optimizations implemented in C/C++ compilers seem deeply tied to function boundaries and I don't know if that is fundamental or legacy from C. I am generally concerned about tying UB too much to function boundaries; I think it would be deeply surprising if during refactoring I found I eliminated all but one caller to a helper function and decided to manually inline it into its caller so that I could apply some simplification... then found that I had added UB. But on the other hand, I think the very existence of protectors indicates we already have the opposite refactoring hazard; adding a helper function can introduce UB. That seems more dangerous, and yet I have only a handful of examples of deallocating against a protector: |
Ah right, under that variant naive inlining becomes unsound. That's certainly not tenable. We would at least need some way to still have the "end of scope" happen after inlining.
We're getting the first concrete requests for optimization that even come with concrete LLVM proposals that are incompatible with
C has LLVM has noalias infrastructure that I haven't understood in detail yet. But I think it boils down to statically marking each load/store as being within some "noalias scope". The regular function-argument-level noalias is then simply equivalent to putting all loads/stores inside the function into its scope, but smaller scopes can be had within a function. In Rust this would correspond, for each However LLVM is also getting new noalias infrastructure and I know even less about how that works.^^ But no other language I know permanently remembers the full tree of how pointers get derived even as the control flow moves back up the call stack. When it's all safe code I think it makes perfect sense to keep that tree, but for raw pointers... well really it's just functions like |
How much would be solved by If we can solve most of the unexpected UB issues with that sort of change to the language and stdlib, I think that's preferable to weakening |
I think it would have to still be technically |
AFAIK those kind of functions are the main motivation for not having spurious writes on However the function might look more like fn as_mut_data_ptr(&mut self) -> *mut Data { addr_of_mut!(self.data) } and annoyingly the version of that that takes a raw pointer would have to be unsafe. And there is the question whether we want to force users to write such "raw pointer getters" in this particular style -- that does sound like quite the footgun.
Yeah I guess |
Another way to phrase this would be: under SB and TB, a pointer P is "live" as long as any pointer derived from it is live -- even if the function that created P has long since returned. To me that seemed like the most obvious way to operationalize what the borrow checker is doing. But it is indeed not how |
Here's an example of an optimization that was just brought up in a discussion as "example of an optimization Rust should be able to do one day". I am fairly sure that in a function like Of course, unlike OTOH, the current model is very simple: we always track the ancestry of all pointers, and use that to determine conflicts. |
Here's the intuitive model that I held until yesterday when I started reading about stacked borrow semantics. Every reference to a The stacked/treed borrow rules screw up this intuitive model by making it possible for a |
That's unfortunately fundamentally necessary, so I'm afraid your mental model is a bit too optimistic here.
That, too, is fairly fundamental -- avoiding this means we can basically entirely ditch the idea of reference-based optimizations entirely. Consider: fn foo(x: &mut i32, y: &mut i32) {
let xraw = x as *mut i32;
let yraw = y as *mut i32;
// If the two pointers alias, this is UB: the write to xraw invalidates yraw.
xraw.write(0);
yraw.write(0);
} This issue is about the fact that Stacked Borrows and Tree Borrows are more restrictive than LLVM |
I don't think I understand why the example you've written poses a problem. In C, it's UB if a void foo(int *restrict x, int *restrict y) {
int *xraw = x;
int *yraw = y;
*xraw = 0;
*yraw = 0;
} Here if C's rules about void foo(int *restrict x, int *restrict y) {
int *xraw = x;
int *yraw = y;
*xraw += 1;
*yraw += 1;
*xraw += 1;
} and looked at how this is assembled on aarch64. I got just the result I expected: ldr w8, [x1]
ldr w9, [x0]
add w8, w8, #1
add w9, w9, #2
str w8, [x1]
str w9, [x0] Clearly clang or LLVM was able to deduce that On the other hand, here's a case where Rust could have more UB and I wouldn't find it perverse: fn foo(x: &mut i32) -> i32 {
let xraw = x as *mut i32;
unsafe { xraw.write(0); }
*x
} The equivalent in C with |
This is an incorrect assessment of the semantics of Per N3301 (latest draft for C23), `6.7.4.1 #9` An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.4.2, requires that all accesses to that object use, directly or indirectly, the value of that pointer.6.7.4.2
In respect to defined behaviour, TB mutable references are actually weaker in many (but not all) ways than restrict. |
No, that's not correct. I suggest you read the relevant parts of the C standard; it doesn't say what you seem to think it says.
This is wrong, too. This example has UB in C if the pointers alias: void foo(int *restrict x, int *y) {
*x = 1;
*y = 1;
}
"problem" in which sense? You don't understand why it is UB, or you don't understand what it has to do with your argument? Your argument was that it is bad that "it possible for a If you have further questions about why certain examples are UB, I'd ask you to take that to a new thread, so that we can in this thread collect arguments for and against the question stated in the OP, without filling it with misunderstandings about the basic concepts being discussed here. We have a Zulip stream where we're always happy to answer questions, or you can open a new issue here with a concrete question if you prefer Github. |
There is no world in which it is okay for this program to be UB. You can write basically the exact same thing in safe code: fn foo(x: &mut i32) -> i32 {
let x_notraw = &mut *x;
*x_notraw = 0;
*x
} Obviously this code is fine, and therefore obviously the code you wrote should also be fine. I am not sure what your intuition is based on, but it doesn't seem to be anywhere close to how Rust references work, so it is not surprising that your intuition would then clash with Stacked Borrows and Tree Borrows that are closely built on the way Rust references work. |
I have now come across a few such examples. For example, this code here uses it and trips up Tree Borrows (although not Stacked Borrows). Another one is this here (since fixed). Unfortunately, these issues in the wild are a bit more complicated, with there being several newtype wrappers and the proper way of fixing it being that in addition to exposing everything as a If we only look at Even further, I fear that in general,
|
I agree with your assessment. I've had the same thoughts for a while, but I think this ship already sailed long long ago. The idea that function parameters are magic and have extra UB is fundamental to |
What's so frustrating about
But sadly, Rust does not let you do it without some unstable feature (and it would probably break some existing code). And in general, in other abstractions that want to give out raw pointers without unnecessarily modifying the provenance, this would not work as simply. Consider
That function is impossible to write when it takes a raw pointer, except if one makes the function A hacky solution to this is inventing an attribute So in the end, it all comes back to raw pointer ergonomics. If we want references to have very strict rules, the language should not box (no pun intended) you into using them all the time. What further causes problems is that many crates out there don't add enough "unsafe accessors" to give you raw pointers, instead all you get is a |
The following code is fine according to LLVM
noalias
, but rejected by both Stacked Borrows and Tree Borrows:The reason (in Tree Borrows terms) is that when
ptr2
is created, it is considered as a "sibling" pointer toptr1
, and even though theirnoalias
scope is over, the model still remembers that these pointers are not identical and actions on one can disable the other. Put differently,ptr1
andptr2
are derived from mutable references that were created to callas_mut_ptr
, and those references are considered to be live as long as any pointer derived from them is live.This is probably going to be surprising. Is it a problem? Tree Borrows is deliberately stricter than LLVM noalias; we want to model Rust concepts, not just the LLVM attribute, and we are hoping to get benefits even inside functions where LLVM's function-scoped
noalias
cannot be used.We could attempt to allow code like the above by somehow having a notion of "end of scope" for a mutable reference, and having its no-alias requirements end completely at that point. Is that worth it? And how exactly should that work? Each reference remembers the function it was created in, and when that function ends, it ceases to impose aliasing requirements? For functions with a signature like
fn(&mut T) -> &mut U
, we certainly want to keep the no-alias requirement for the return reference around even after the function returned -- but maybe that can rely entirely on the caller doing a retag. The bigger concern is that this makes the function boundary very special (even more so than protectors), and the following code would still be UB:And... that seems okay? In fact I think we want that code to be UB even if
(ptr1, ptr2)
is returned out of the function and used in the caller (example below). That code creates two overlapping&mut
, I see no good reason to allow this code. But if we reject this code, then why would we accept the variant that usesas_mut_ptr
? Is it only because the&mut
is now implicit, an auto-ref? Should auto-refs have weaker semantics than explicit&mut
? That is insufficient, we wantnoalias dereferenceable
for&mut self
arguments. A combination of "auto-ref is somehow very weak" and "function-entry retags have an 'end of scope'" would be sufficient, though the details on what auto-refs do are fuzzy. Maybe they just don't retag at all. (Function-entry retags also don't correspond to an&mut
in the source, so it could be justified that they are somehow weaker.) However I think there are plenty of cases where we want aliasing assumptions even when there was no explicit&mut
, like on the references returned byget_mut
-style functions.Or should we devise something specific to methods like
as_mut_ptr
that suppresses the implicit reborrows (and also suppresses thenoalias dereferenceable
, which we really don't need for these methods)? That seems rather ad-hoc though.I wonder what others think, in particular in terms of which of these examples should be allowed (if any) and which not.
The text was updated successfully, but these errors were encountered: