Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the caches mutating a deque node through a NonNull pointer derived from a shared reference #259

Merged
merged 5 commits into from
Apr 27, 2023

Conversation

tatsuya6502
Copy link
Member

@tatsuya6502 tatsuya6502 commented Apr 27, 2023

I found bugs in the sync and future caches; they were mutating an internal deque node through a NonNull pointer derived from a shared ref &. This PR fixes these bugs.

These bugs already existed from very early version of Moka but I could not find them until now. Miri tests could have helped to find them, but we are not running Miri tests on the sync_base module who has the bugs. This is because the module uses some FFI calls that Miri does not support.

In commit 03c75ff (included in this PR), I added a test to deque module to reproduce the bug and confirmed that Miri can catch it.

$ cargo +nightly miri test deque

test common::deque::tests::peek_and_move_to_back ... error: Undefined Behavior: trying to retag from <506610> for Unique permission at alloc194760[0x0], but that tag only grants SharedReadOnly permission for this location
   --> /Users/tatsuya/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:427:18
    |
427 |         unsafe { &mut *self.as_ptr() }
    |                  ^^^^^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <506610> for Unique permission at alloc194760[0x0], but that tag only grants SharedReadOnly permission for this location
    |                  this error occurs as part of retag at alloc194760[0x0..0x28]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <506610> was created by a SharedReadOnly retag at offsets [0x0..0x28]
   --> src/common/deque.rs:733:37
    |
733 |         unsafe { deque.move_to_back(NonNull::from(node1a)) };
    |                                     ^^^^^^^^^^^^^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `std::ptr::NonNull::<common::deque::DeqNode<std::string::String>>::as_mut::<'_>` at /Users/tatsuya/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:427:18: 427:37
note: inside `common::deque::Deque::<std::string::String>::move_to_back`
   --> src/common/deque.rs:197:20
    |
197 |         let node = node.as_mut(); // this one is ours now, we can create an &mut.
    |                    ^^^^^^^^^^^^^
note: inside `common::deque::tests::peek_and_move_to_back`
   --> src/common/deque.rs:733:18
    |
733 |         unsafe { deque.move_to_back(NonNull::from(node1a)) };
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This PR fixes the bugs by not using a shared reference in these cases:

  • Add Deque::peek_front_ptr method, which is similar to existing peek_front method. It returns a NonNull pointer to the front node, instead of a shared reference.
  • Remove DeqNode::next method that returns a shared reference to the next node.
  • Add DeqNode::next_ptr function that returns a NonNull pointers to the next node.
  • Add Deque::move_front_to_back method that moves the front node to the back of the deque, instead of moving the node through a shared reference to the node.
  • Modify the code in the sync_base module to use above methods when they need to mutate a DeqNode.

This PR also adds the test mentioned above to deque module to ensure that the bugs are fixed.

I do not know if there were actual impacts of the bugs. I think the bugs were not triggered because the code never accesses the shared reference again after the mutable reference is created. Also the code is executed only when some corner cases are met.

from a shared ref

Add a test to `deque` module to reproduce the bug.
from a shared ref

- Add `Deque::peek_front_ptr` method that returns a `NonNull` pointer (instead of a
  shared reference) to the front node.
- Remove `DeqNode::next` method.
- Add `DeqNode::next_ptr` _function_ that returns a `NonNull` pointers to the next
  node.
- Add `Deque::move_front_to_back` method that moves the front node to the back of the
  deque, instead of moving the node through a shared reference to the node.
@tatsuya6502 tatsuya6502 self-assigned this Apr 27, 2023
@tatsuya6502 tatsuya6502 added the bug Something isn't working label Apr 27, 2023
@tatsuya6502 tatsuya6502 modified the milestones: v0.12.0, v0.11.0 Apr 27, 2023
Copy link
Member Author

@tatsuya6502 tatsuya6502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging.

@tatsuya6502 tatsuya6502 added this pull request to the merge queue Apr 27, 2023
Merged via the queue into master with commit 4bc8553 Apr 27, 2023
@tatsuya6502 tatsuya6502 deleted the fix-deq-mutating-through-shared-ref branch April 27, 2023 23:34
tatsuya6502 added a commit that referenced this pull request Jul 2, 2023
Fix the caches mutating a deque node through a `NonNull` pointer derived from a
shared reference.
tatsuya6502 added a commit that referenced this pull request Jul 2, 2023
Fix the caches mutating a deque node through a `NonNull` pointer derived from a
shared reference.
tatsuya6502 added a commit that referenced this pull request Jul 3, 2023
@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jul 3, 2023

I do not know if there were actual impacts of the bugs. I think the bugs were not triggered because the code never accesses the shared reference again after the mutable reference is created. Also the code is executed only when some corner cases are met.

I was wrong. A Moka v0.9.6 user reported segmentation fault. I analyzed a core dump and it seems it was caused by this bug:
#281 (comment)

I backported this fix to v0.9.8 and also plan to backport to v0.10.x.

tatsuya6502 added a commit that referenced this pull request Jul 4, 2023
Fix the caches mutating a deque node through a `NonNull` pointer derived from a
shared reference.
tatsuya6502 added a commit that referenced this pull request Jul 4, 2023
Fix the caches mutating a deque node through a `NonNull` pointer derived from a
shared reference.
tatsuya6502 added a commit that referenced this pull request Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant