Fixed large swaths of unsoundness in `nalgebra`. #949

vihdzp · 2021-07-18T01:47:38Z

Uninitialized data buffers are used throughout the nalgebra codebase. As per the Rust docs, the correct way to use an uninitialized buffer would be to declare a Storage<MaybeUninit<T>, _, _>, initialize all of its entries, and then somehow convert it into a Storage<T, _, _>. However, nalgebra skips the MaybeUninit step altogether. This isn't much of a problem when T is a primitive numeric type (under the current compiler!), but if T has invalid bit combinations or needs to be dropped, this can cause serious trouble. Consequently, most code in nalgebra can currently trigger UB. 😰

This PR aims to solve this issue. It will be marked as a Draft until we rigorously check safety requirements throughout the new code, make the transition from the previous code simpler, polish some negative consequences of this change, and until the other nalgebra crates are rid of these same issues.

Changes

Solving this issue required an extensive rework of some core traits of the library. I explain some of the more important ones:

`Scalar` trait redefinition

Previously, Scalar served as the monolithic trait that bounded the data type in every single data structure in nalgebra. It had the following traits as sub-traits:

pub trait Scalar: Clone + PartialEq + Debug + Any { /* ... */ }

This was immediately problematic, since MaybeUninit<T> does not implement PartialEq, and only implements Clone when T: Copy.

To get around this, Scalar was demoted from being a universally required trait to merely being a convenient marker for methods with clear numerical purpose. That is, T: Scalar is now only required for numerical methods on matrices, and is no longer required for methods that merely treat matrices as a data storage.

The PartialEq bound was removed, as the vast majority of methods make no use of it (and those that did could simply use a trait bound). This restuls in the current Scalar trait:

pub trait Scalar: 'static + Clone + Debug { /* ... */ }

The 'static bound is currently only used to optimize matrix multiplication on floating point numbers (specifically gemm), via TypeId::of. If we wanted to, we could replace this use by associated methods is_f32 and is_f64. The remaining trait bounds are essential, since there's very little we can numerically operate on a matrix without cloning (unless we use some highly unsafe moves), and since debugging generic code would be almost impossible without the Scalar: Debug bound.

`Allocator` trait redefinition

The Allocator trait allows for matrix storage allocation to be specialized, depending on whether any of its dimensions are Dynamic. Formerly, it consisted only of a Buffer associated type, and an allocate_from_iterator associated method. This simpler trait has been renamed InnerAllocator.

The new Allocator trait looks like this:

pub trait Allocator<T, R: Dim, C: Dim = U1>:
    InnerAllocator<T, R, C>
    + InnerAllocator<MaybeUninit<T>, R, C>
    + InnerAllocator<ManuallyDrop<T>, R, C>
{ /* ... */ }

Using this more specific trait is almost never an issue, and is in fact encouraged, as it makes code future-proof while not actually changing the types you can use your method with.

Various methods for uninitialized buffers

The following methods have been added or reworked for convenience of working with uninitialized data storage.

new_uninitialized_generic: allocates a matrix with an owned uninitialized storage.
assume_init: retrieves a matrix with an initialized storage out of an uninitialized one.
assume_init_ref / assume_init_mut: same as above, but don't take ownership of the matrix and return a full slice of it instead.

Other minor changes

There's a few other changes I made and methods I implemented, mostly out of convenience.

Added various safety remarks on new and existing code.
Added methods to take matrix transposes from an owned matrix.
Added a convenience method to build a "full slice" out of matrix.
Expanded the documentation of certain methods and types.
Used unsafe typecasts to get rid of a few clones here and there.
Switched a few #[repr(C)] into #[repr(transparent)], got rid of others altogether.

vihdzp · 2021-07-18T22:50:25Z

In the current implementation, we've lost the capability to use #[derive(Copy, Clone, Debug)] on any type containing an owned matrix. I'm currently thinking of workarounds, or perhaps a more radical solution to overhaul the Owned type itself.

sebcrozet · 2021-08-02T17:02:23Z

Thank you a lot @OfficialURL for all your efforts on this. You have done a fantastic job exploring the various approaches. Since you told me you may no longer have much time for this, I am taking over as agreed.

The single `Allocator` trait and two storage traits approach

From all the approaches @OfficialURL explored, it became apparent that all the possible solutions have shortcomings. And among these solutions the one that does not have a negative impact on the ergonomics of the API involving initialized matrices is the approach involving a single allocator trait with two buffer associated types. I have been experimenting with this and it works well. It looks like this:

pub trait Allocator<T, R: Dim, C: Dim = U1>: Any + Sized {
    type Buffer: StorageMut<T, R, C> + IsContiguous + Clone + Debug;
    type BufferUninit: RawStorageMut<MaybeUninit<T>, R, C> + IsContiguous;

    fn allocate_uninit(nrows: R, ncols: C) -> Self::BufferUninit;
    unsafe fn assume_init(uninit: Self::BufferUninit) -> Self::Buffer;
    fn allocate_from_iterator<I: IntoIterator<Item = T>>(
        nrows: R,
        ncols: C,
        iter: I,
    ) -> Self::Buffer;
}

We can see a few new things here: RawStorageMut and IsContiguous.
Because non-initialized matrices can’t be cloned, storages containing MaybeUninit<T> elements can’t implement the Storage::into_owned and Storage::clone_owned methods. That’s why I split the Storage trait into two parts:

RawStorage that contains everything our existing Storage trait has right now, except for ::clone_owned and into_owned.
Storage that inherits from RawStorage and adds the ::clone_owned and ::into_owned methods.

From the en-user point of view, this should not have no impact on pre-existing code: they will continue to use Storage normally. And this allows us to use RawStorage whenever we are using a matrix that may be uninitialized.

The IsContiguous trait replaces what was before ContiguousStorageMut and ContiguousStorage. For example our current ContiguousStorage trait has been removed and replaced by ContiguousStorage + IsContiguous. This may make some pre-existing code more verbose, so we can re-add the ContiguousStorage trait in the future if that can avoid unwanted downstream breakage. @Andlon please let me know what you thing about this.

I have pushed a commit making these changes. I also removed from the previous commits everything that wasn’t directly related to fixing our unsoundness problem. So the Scalar trait is back to the way it was before (with the RelativeEq inheritance), and Scalar is used in more places than it could be. I did this to keep the diff smaller, so we can focus more on the main purpose of this PR.

About `blas.rs/blas_uninit.rs`

I simplified the blas.rs modification made by the previous commit by:

Introducing a way to re-use the same code for the case where output matrix has T or MaybeUninit<T> argument. This avoids lots of code duplication.
Implementing the uninit version of gemm (and the other blas operations it uses) only. This is the only blas method we currently used with an uninitialized buffer.

About `linalg`

The changes needed to use uninit matrices properly wherever we currently use them was too complicated to get right, error prone, and difficult to read. So I ran some benchmark and noticed that:

Using uninit matrices for output matrices have a benefit.
Using uninit matrices for workspaces have no benefit.

This makes sense because workspaces are read/written a lot, making the initialization cost negligible.

This is great because uninit workspaces are the ones that are very difficult to use properly because they are passed around to multiple functions, are read/written multiple times, etc. So avoiding things like double-initialization is very difficult.
Uninit output matrices on the other hand are very easy to manipulate because we only write to them once.

So in the end, I used zero-initialized matrices for workspaces, and uninit matrices for outputs.

Limitation

The main limitation of this approach is that Allocator::UninitBuffer may not unify correctly with Matrix<MaybeUninit<T>,...>. However, this didn’t prove very problematic in practice to make everything work.

Andlon · 2021-08-03T07:16:26Z

I would like to try out this branch on my code base shortly.

In the meantime, I have a question about linalg outputs: what if you want to store the result in an initialized buffer? I do this all the time to ensure no/fewer allocations when doing repeated operations.

sebcrozet · 2021-08-03T07:49:10Z

In the meantime, I have a question about linalg outputs: what if you want to store the result in an initialized buffer? I do this all the time to ensure no/fewer allocations when doing repeated operations.

The public API of the linalg module hasn’t changed so what you are currently doing will continue to work.

Andlon · 2021-08-03T08:03:45Z

In the meantime, I have a question about linalg outputs: what if you want to store the result in an initialized buffer? I do this all the time to ensure no/fewer allocations when doing repeated operations.

The public API of the linalg module hasn’t changed so what you are currently doing will continue to work.

Ah, thanks for clarifying! I tried to quickly zoom through the commit to figure it out, but it was hard to get an overview from a brief glance.

sebcrozet · 2021-08-03T09:07:58Z

The CI passes now. The main remaining problem are the uses of ptr::copy and ptr::copy_nonoverlapping which are sometimes problematic for non-copy types (may lead to double-drop or missing drops).

Andlon · 2021-08-03T11:32:29Z

All right, so with some minor fixes only related to the changes to .zip_apply and similar, my code compiles without any changes to trait bounds or similar, so that's great :-)

I looked through the docs a little. Shouldn't methods like Allocator::allocate_uninitialized (in its current form) be removed? Similarly, shouldn't methods like OMatrix::new_uninitialized_generic also be changed to return something like UninitMatrix<T, R, C>?

Andlon · 2021-08-03T11:36:07Z

By the way, one slight "regression" in the linalg APIs. IIRC, before you could have an uninitialized Matrix and call e.g. .gemm() on it with beta == 0, since it's guaranteed that self was never read in that case. But this is no longer possible, is it? So you'd need to fully initialize the output in this case?

EDIT: This isn't necessarily a big deal (for me certainly not), but I thought it worth mentioning.

sebcrozet · 2021-08-03T12:42:36Z

@Andlon

I looked through the docs a little. Shouldn't methods like Allocator::allocate_uninitialized (in its current form) be removed? Similarly, shouldn't methods like OMatrix::new_uninitialized_generic also be changed to return something like UninitMatrix<T, R, C>?

Yes, I’ve removed them this morning. I’m finishing cleaning everything up today.

By the way, one slight "regression" in the linalg APIs. IIRC, before you could have an uninitialized Matrix and call e.g. .gemm() on it with beta == 0, since it's guaranteed that self was never read in that case. But this is no longer possible, is it? So you'd need to fully initialize the output in this case?

It’s not really a regression because passing an uninitialized Matrix was unsound in the first place (even though it didn’t cause any trouble in practice). If someone really wants to use an uninitialized matrix, they will be able to use the gemm_uninit function which can take an uninit matrix as the y mutable argument. However, that method is unsafe (because passing an uninit watrix with beta != 0 would be unsound).

Andlon · 2021-08-03T14:46:28Z

That's excellent, thanks for clearing that up :)

Andlon · 2021-08-04T17:02:03Z

Great work once again with spearheading the initiative to remove this source of unsoundness in the design of nalgebra, @OfficialURL! 🥳 This was only made possible by your relentless exploration of the different design candidates. And really awesome that you could take the time to help push it over the finishing line @sebcrozet!

It's been an open issue for such a long time, it's amazing to see this fixed! 🎉

vihdzp added 19 commits July 14, 2021 04:25

Checkpoint #1

1a78b00

Finally figured out some trait nitty-gritty

8d10e69

More trait restructuring!

7759171

blas.rs should be sound now

bbd045d

blas.rs works now!

df9b6f5

Tied some blas loose strings

54e9750

ops.rs works too now!

8270dd8

Checkpoint dimforge#8

c3f869e

Checkpoint dimforge#9

87fe2b3

Checkpoint dimforge#10

a6b8dd6

Almost!

9a528e2

We have reached compilation!

c01d591

Tests work!

0687318

Got rid of the unimplemented_or_uninitialized_generic macro

fa1ed96

Fixed some more blatant issues

7e1b2f8

Fix botched Search + Replace

4bd13a5

Many miscellaneous improvements throughout

10b5dc9

Miscellaneous improvements

b74be84

cargo fmt

a753d84

vihdzp and others added 4 commits July 19, 2021 18:00

First draft of Owned overhaul

22b657f

Added some derives

2243a11

Merge branch 'dev' into ub3

d34fed4

Implement the single-allocator-trait approach.

8c6ebf2

sebcrozet force-pushed the ub3 branch from b8b19b7 to 8c6ebf2 Compare August 2, 2021 16:41

Fix test and no-std builds.

7a1a4bc

sebcrozet added 4 commits August 3, 2021 10:07

Fix build with --all-features

f67a81e

Fix nalgebra-lapack

47e226d

Fix nalgebra-sparse

492ed8c

First step to fix unsoundness on the resize API.

27ae30b

sebcrozet added 3 commits August 3, 2021 17:02

Address unsoundness in the resizing API.

d609a2f

Fix missing docs.

eedb860

Fix nalgebra-lapack.

38ac9a2

sebcrozet marked this pull request as ready for review August 3, 2021 15:44

sebcrozet added 3 commits August 3, 2021 17:53

Remove the Scalar::is method, which is unsound.

6d57396

More inlining.

65b2995

nalgebra-lapack: restore netlib as the default backend.

107b3be

sebcrozet merged commit 71ceb1f into dimforge:dev Aug 4, 2021

aweinstock314 mentioned this pull request Aug 15, 2021

Consider supporting non-Copy scalar types. #282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed large swaths of unsoundness in `nalgebra`. #949

Fixed large swaths of unsoundness in `nalgebra`. #949

vihdzp commented Jul 18, 2021 •

edited

Loading

vihdzp commented Jul 18, 2021

sebcrozet commented Aug 2, 2021 •

edited

Loading

Andlon commented Aug 3, 2021

sebcrozet commented Aug 3, 2021 •

edited

Loading

Andlon commented Aug 3, 2021

sebcrozet commented Aug 3, 2021

Andlon commented Aug 3, 2021

Andlon commented Aug 3, 2021 •

edited

Loading

sebcrozet commented Aug 3, 2021

Andlon commented Aug 3, 2021

Andlon commented Aug 4, 2021

Fixed large swaths of unsoundness in nalgebra. #949

Fixed large swaths of unsoundness in nalgebra. #949

Conversation

vihdzp commented Jul 18, 2021 • edited Loading

Changes

Scalar trait redefinition

Allocator trait redefinition

Various methods for uninitialized buffers

Other minor changes

vihdzp commented Jul 18, 2021

sebcrozet commented Aug 2, 2021 • edited Loading

The single Allocator trait and two storage traits approach

About blas.rs/blas_uninit.rs

About linalg

Limitation

Andlon commented Aug 3, 2021

sebcrozet commented Aug 3, 2021 • edited Loading

Andlon commented Aug 3, 2021

sebcrozet commented Aug 3, 2021

Andlon commented Aug 3, 2021

Andlon commented Aug 3, 2021 • edited Loading

sebcrozet commented Aug 3, 2021

Andlon commented Aug 3, 2021

Andlon commented Aug 4, 2021

Fixed large swaths of unsoundness in `nalgebra`. #949

Fixed large swaths of unsoundness in `nalgebra`. #949

vihdzp commented Jul 18, 2021 •

edited

Loading

`Scalar` trait redefinition

`Allocator` trait redefinition

sebcrozet commented Aug 2, 2021 •

edited

Loading

The single `Allocator` trait and two storage traits approach

About `blas.rs/blas_uninit.rs`

About `linalg`

sebcrozet commented Aug 3, 2021 •

edited

Loading

Andlon commented Aug 3, 2021 •

edited

Loading