Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Proposal for bit data #205

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 199 additions & 0 deletions active/0000-bitdata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
- Start Date: 2014-07-03
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

This RFC proposes to add support for bit-data types.

# Motivation

Rust aims to be a systems level language, yet it does not support bit-level
manipulations to a satisfactory level. We support a macro `bitflags!` for
supporting individual bits, but there is no support for bit-ranges. Anyone
who has had to write disassemblers for the x86 instruction set would concur
to how error-prone it is to deal with shifts and masks.

With this RFC accepted, we can describe a PCI address:

```rust
bitdata PCI {
PCI { bus : u8, dev : u5, fun : u3 }
}
```

This definition describes a 16-bit value whose most significant eight bits
identify a particular hardware bus.

Immediate values can be specified anywhere in the definition, which provides
a way to discriminate values:

```rust
bitdata KdNode : u64 {
NodeX { axis = 0 : u2, left : u15, right: u15, split : f32 },
NodeY { axis = 1 : u2, left : u15, right: u15, split : f32 },
NodeZ { axis = 2 : u2, left : u15, right: u15, split : f32 },
Leaf { tag = 3 : u2, _: u2, tri0 : u20, tri1 : u20, tri2 : u20 }
}
```
This defines a 64-bit value, where the two most significant bits indicate
the type of node (internal node divided in x-, y- and z-axis, or a leaf
node to the triangle vertex data).

With this in place, one could implement point lookup as such:
```rust
fn lookup(pt: Vec3, ns: &[KdNode]) -> Option<(uint,uint,uint)> {
let mut i = 0u;
loop {
let n = match ns[i] {
NodeX {left, right, split} => if pt.x < split { left } else { right },
NodeY {left, right, split} => if pt.y < split { left } else { right },
NodeZ {left, right, split} => if pt.z < split { left } else { right },
Leaf {tri0, tri1, tri2} => return Some(tri0, tri1, tri2)
};
if n == 0 { return None }
i = n;
}
}
```

# Detailed design

All `bitdata` are calculated in units of bits instead of bytes. For this reason,
it is illegal to take the address of individual components.

## Syntax

The syntax needs to be extended with bit-sized integer literals. These are written
as `4u7`, or `-1i4`. In addition, bit-sized types of the form `u15` and `i9`
needs to be added. If the compiler needs to treat them as normal values,
zero- or sign-extension must take place.

```ebnf
BITDATA-DEFN ::= "bitdata" IDENT (":" TYPE)? "{" BITDATA-CONS-LIST* "}"
```

This introduces the `bitdata` type with a name (the identifier), an optional
carrier type, followed by a block of bitdata constructors. The carrier type
is used as a substitution when regular data-types are needed.

```ebnf
BITDATA-CONS-LIST ::= BITDATA-CONS ("," BITDATA-CONS)*
BITDATA-CONS ::= IDENT "{" BITFIELD-LIST "}"
```

The bitdata constructors are all named constructors, each with bit-fields. A
bit-field is either tag-bits or a labeled bit-field. Tag-bits are constant
expressions (used for bit-field `match`), and labeled bit-fields are named
bit-ranges.

```ebnf
BITFIELD-LIST ::= BITFIELD ("," BITFIELD)*
BITFIELD ::= TAG-BITS | LABELED-BIT-FIELD
TAG-BITS ::= BIT-LITERAL
LABELED-BIT-FIELD ::= IDENT ( "=" CONST-EXPR )? ":" BITDATA-TYPE
```

The valid bitdata-types are only other bitdata-types (by name) or else unsigned
and signed bit-types like e.g. `u12`, and also floating-point value types.

```ebnf
BITDATA-TYPE ::= ("u" | "i") ('0'-'9')+ | "f32" | "f64" | IDENT
BIT-LITERAL ::= INT-LITERAL ("u" | "i") ('0'..'9')+
```

## Limitations

* Each constructor must have the exact same bit-size.
* If the `bitdata` definition has a type specifier, all constructor bit-sizes must match this.
* Tag-bits and labeled bit-fields with initializers act as discriminators, but they need
not be exhaustive.

## Construction

```rust
let addr = PCI { bus : 0, dev : 2, fun : 3 };
let tree = vec![ NodeX { left: 1, right: 2, split: 10.0 }, // 0
NodeY { left: 3, right: 0, split: 0.5 }, // 1
Leaf { tri0: 0, tri1: 1, tri2: 2 }, // 2
Leaf { tri0: 3, tri1: 4, tri2: 5 } ] // 3
```

## Bit-field access

Access through the `.` operator is unchecked. In other words, this is valid

```rust
fn f(node : KdNode) -> f32 { node.NodeX.axis }
```

If there is only one bit-constructor in the bit data, the constructor name may
be elided:
```rust
fn bus(pci : PCI) -> u8 { pci.bus }
```

## Matching

Matching is not nescessarily exhaustive, as there may be "junk" values. For
instance,
```rust
bitdata T { S { 0u5 }, N { 0b11111u5 } }
```
Here `T` is 5-bits, but if the value is anything else than 0 or 31, it is
considered "junk":
```rust
match t { S => "Zero", N => "Non-zero", _ => "Junk" }
```

## Compared to `enum`

The `bitdata` type is similar to the existing `enum` type with the following
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to make it more similar to a struct of various bit lengths?

Since enums are usually one long value of a certain width which would make it prone to more endian issues (The whole order would be off)

As well as a numerical value (enum) would require more packing and unpacking to read\write as well as masking, where as a struct is at the least one byte aligned (So the compiler would only have to pack\unpack anything under a byte and anything that's not byte aligned (that is, not a u8\16\32\64\etc.,) )

Also, anything saved as a bitdata as an enum would have to be the biggest number value (so if you have 20 bits of data, you'd be writing a 32 bit number instead of a struct where 20 bits would round up to 24 bits (or an array of 3 bytes)) This helps with storage in extremely tight areas, but more importantly in serial\Microcontroller communication (unless you accommodate on the non-pc end) as sending more bytes than necessary will throw off the program... (Best solution i can think of for this is MIDI, MIDI Commands are sent 3 Bytes at a time (24 Bits), so a 4bit value would mess it up.

differences:

* The discriminator is not added automatically.
* All bit-data constructors must have the exact same bit-size.

## Notes

`bitdata` may help reduce some unsafe operations such as transmute. For instance,
we can analyse a IEEE-754 value:

```rust
bitdata IEEE754 {
F { value : f32 },
I { sign : u1, exp: u8, mant: u23 }
}

fn float_rep(f : f32) {
let x = F { value : f };
println!("s:{}, e:{}, m:{}", x.I.sign, x.I.exp, x.I.mant)
}
```

# Alternatives

It has been suggested to implement this a syntax extension. This will not
work, because

* We need significant error-checking, including bit-size calulations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on why you don't think that a procedural macro would be able to provide this error checking?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the language had support for bitdata signed and unsigned values (like e.g. u4 and i12), I believe it would be possible to support it through procedural macros. Having said that, I'm not an expert on procedural macros, and honestly - if anyone can write it using this feature - I would love to see it.

and overlapping tag checks
* `bitdata` definitions may make use of other `bitdata` definitions
* Syntactic overhead would be large
* It is unclear how cross-module usage and type-checking would occur
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For proper integration in the language, you would also need to import bitdata representations from other crates (e.g. to build a bitdata A using components from another bitdata B). I have a hard time seeing this being implemented using macros.


# Drawbacks

# Unresolved questions

This RFC does not discuss endianess issues. It is assumed that the bit-fields
are defined in target endianess.

Also, we could support inline-arrays of bit fields, but that could be saved
for a future implementation. For instance:
```rust
bitdata KdTree {
// ...
Leaf { tag = 3 : u2, _: u2, tri : [u20,..3] }
}
```