Idea: "compact" enums and structs #12642

glaebhoerl · 2014-03-01T12:33:09Z

TL;DR By optionally taking away interior references, enums and structs can be represented much more compactly, potentially taking over most of the current use cases for manual bit twiddling code.

Rust leaves the layout of its enum types undefined for optimization purposes, but in practice most optimization possibilities are ruled out by the fact that interior references need to work. For instance, if you have an enum which contains another enum, you may want to collapse their discriminants together into a single field:

enum X { Foo(Option<bool>), Bar(bool) }

Here we could use a single discrimant field for both X and Option, so that it ranges 0..2, e.g. 0 => Foo(None), 1 => Foo(Some), 2 => Bar. Going further, we could pack in the bools as well, and represent X as a single u8: 0 => Foo(None), 1 => Foo(Some(false)), 2 => Foo(Some(true)), 3 => Bar(false), 4 => Bar(true). But in both cases, this makes interior references, e.g. match some_x { Foo(ref opt_bool) => ..., _ => ... } impossible, so we must refrain.

The idea would be to allow annotating an enum as "compact", in which case interior references are made illegal (but see below), and these optimization possibilities are opened back up: the compiler would be free to use any magic it wants to make the representation of the enum as compact as it is able, combining discriminants as above, taking advantage of tag bits in pointers, NaN-boxing, and so on. When generating code, the compiler would insert the necessary bit twiddling operations to pack the original values into this representation, and to take them back out again.

The same thing is possible for structs. For instance, a struct of bools could be represented as a bunch of bits:

compact struct Flags { 
    is_big: bool,
    is_red: bool,
    is_quick: bool,
    is_hungry: bool,
    ...
}

Now size_of::<Flags>() == size_of::<u8>()! If we let our imagination run further ahead, given types like i31, i30 and so on, we could also take advantage of those extra bits.

(It's important to note that this "compactifying" could always be deep, because the restriction against interior references would also be deep. If we have a non-compact struct type and store it in a compact one, references to the interior of the inner struct would likewise be forbidden, so it could be compacted as well. The example from above of representing enum X as a single u8 also relies on this property.)

I think this feature could be a much nicer solution for most of the use cases where manual bit twiddling code is currently necessary (and also most of the use cases for bitfields in C).

Up until now, for simplicity's sake I've been assuming that interior references would be forbidden. But it may be possible to allow them. The idea is that when you take an interior reference:

let hungry: &bool = &my_flags.is_hungry;

the value would first be copied out onto the stack (only physically, not semantically!), and then the reference would point to the stack value. This is not dissimilar to what happens if you write let nine = &9;. Because & is an immutable reference, the difference would not be observable.

Going deeper into the woods, assuming #12624, &mut borrows could also be represented by copying out the value at the beginning of the borrow, and then copying it back at the end. Again, because &mut would have exclusive access to the given object, the difference should not be observable.

I'm not sure if this would definitely work, but it seems like it might. Whether this is feasible has repercussions for syntax: if compact enums/structs are semantically identical to normal ones, and differ only in representation and performance, #[compact] could just be an attribute. If, however, the semantics differ, it would be more appropriate to use actual syntax to mark the difference.

The text was updated successfully, but these errors were encountered:

Thiez · 2014-03-01T15:04:42Z

How about we have #[derive(Compact as $name)] or something along those lines, which automatically generates some implementations, much like the other derives. For example,

#[deriving(Compact as SmallerX)]
enum X { Foo(Option<bool>), Bar(bool) }

would generate the following:

struct SmallerX { priv data: [u8,..1] }

impl Compact<SmallerX> for X {
  fn compact(self)->SmallerX {
    let mut result = SmallerX{data:[0u8]};
    match self {
      Foo(x) => {
        match x {
          Some(y) => {
            match y {
              true => result.data[0] = 0,
              false => result.data[0] = 1,
            }
          },
          None => result.data[0] = 2,
        }
      },
      Bar(x) => {
        match x {
          true => result.data[0] = 3,
          false => result.data[0] = 4,
        }
      },
    }
    result
  }
}

impl SmallerX {
  fn uncompact(self) -> X {
    match self.data[0] {
      0 => Foo(Some(true)),
      1 => Foo(Some(false)),
      2 => Foo(None),
      3 => Bar(true),
      4 => Bar(false),
    }
  }
}

Or something along these lines. I'm sure someone can come up with an algorithm that can compress at least the enum headers somewhat intelligently. You do X.compact() when you store it, and SmallerX.uncompact() when you want your X back.

The nice thing about this approach would be that no changes need be made to the IR-generation; and users could manually implement trait Compact<T> if they prefer. Some care must be taken when compacting a type that contains borrowed pointers, and the compact types will need to implement Drop to automatically unpack themselves when they contain non-POD types.

glaebhoerl · 2014-03-01T16:14:16Z

@Thiez What about e.g. the Flags struct? It should result in the same code as current flags enum approaches in Rust/C: flags.is_quick should compile to flags & IsQuick. With the deriving(Compact) approach you would have to pack/unpack the entire struct every time you want to access a field, and I'm not sure how well LLVM could optimize all of that away.

(If the semantics could stay the same, according to the scenario at the bottom of the OP, then I was thinking #[compact] would be analogous to the #[packed] attribute we already have as something which only affects the representation of the given item.)

rust-highfive · 2014-09-24T05:00:40Z

This issue has been moved to the RFCs repo: rust-lang/rfcs#311

fix: deduplicate cfg completions cfg completions are duplicated if they are set with multiple values. This patch deduplicates them. fixes: [rust-lang#12623](rust-lang/rust-analyzer#12623)

huonw added the B-RFC label Mar 1, 2014

glaebhoerl mentioned this issue May 26, 2014

RFC: bit fields and bit matching rust-lang/rfcs#29

Closed

rust-highfive mentioned this issue Sep 24, 2014

Idea: "compact" enums and structs rust-lang/rfcs#311

Open

rust-highfive closed this as completed Sep 24, 2014

glaebhoerl mentioned this issue Jul 25, 2015

repr(packed) allows invalid unaligned loads #27060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: "compact" enums and structs #12642

Idea: "compact" enums and structs #12642

glaebhoerl commented Mar 1, 2014

Thiez commented Mar 1, 2014

glaebhoerl commented Mar 1, 2014

rust-highfive commented Sep 24, 2014

Idea: "compact" enums and structs #12642

Idea: "compact" enums and structs #12642

Comments

glaebhoerl commented Mar 1, 2014

Thiez commented Mar 1, 2014

glaebhoerl commented Mar 1, 2014

rust-highfive commented Sep 24, 2014