-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Proposal for bit data #205
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
- Start Date: 2014-07-03 | ||
- RFC PR #: (leave this empty) | ||
- Rust Issue #: (leave this empty) | ||
|
||
# Summary | ||
|
||
This RFC proposes to add support for bit-data types. | ||
|
||
# Motivation | ||
|
||
Rust aims to be a systems level language, yet it does not support bit-level | ||
manipulations to a satisfactory level. We support a macro `bitflags!` for | ||
supporting individual bits, but there is no support for bit-ranges. Anyone | ||
who has had to write disassemblers for the x86 instruction set would concur | ||
to how error-prone it is to deal with shifts and masks. | ||
|
||
With this RFC accepted, we can describe a PCI address: | ||
|
||
```rust | ||
bitdata PCI { | ||
PCI { bus : u8, dev : u5, fun : u3 } | ||
} | ||
``` | ||
|
||
This definition describes a 16-bit value whose most significant eight bits | ||
identify a particular hardware bus. | ||
|
||
Immediate values can be specified anywhere in the definition, which provides | ||
a way to discriminate values: | ||
|
||
```rust | ||
bitdata KdNode : u64 { | ||
NodeX { axis = 0 : u2, left : u15, right: u15, split : f32 }, | ||
NodeY { axis = 1 : u2, left : u15, right: u15, split : f32 }, | ||
NodeZ { axis = 2 : u2, left : u15, right: u15, split : f32 }, | ||
Leaf { tag = 3 : u2, _: u2, tri0 : u20, tri1 : u20, tri2 : u20 } | ||
} | ||
``` | ||
This defines a 64-bit value, where the two most significant bits indicate | ||
the type of node (internal node divided in x-, y- and z-axis, or a leaf | ||
node to the triangle vertex data). | ||
|
||
With this in place, one could implement point lookup as such: | ||
```rust | ||
fn lookup(pt: Vec3, ns: &[KdNode]) -> Option<(uint,uint,uint)> { | ||
let mut i = 0u; | ||
loop { | ||
let n = match ns[i] { | ||
NodeX {left, right, split} => if pt.x < split { left } else { right }, | ||
NodeY {left, right, split} => if pt.y < split { left } else { right }, | ||
NodeZ {left, right, split} => if pt.z < split { left } else { right }, | ||
Leaf {tri0, tri1, tri2} => return Some(tri0, tri1, tri2) | ||
}; | ||
if n == 0 { return None } | ||
i = n; | ||
} | ||
} | ||
``` | ||
|
||
# Detailed design | ||
|
||
All `bitdata` are calculated in units of bits instead of bytes. For this reason, | ||
it is illegal to take the address of individual components. | ||
|
||
## Syntax | ||
|
||
The syntax needs to be extended with bit-sized integer literals. These are written | ||
as `4u7`, or `-1i4`. In addition, bit-sized types of the form `u15` and `i9` | ||
needs to be added. If the compiler needs to treat them as normal values, | ||
zero- or sign-extension must take place. | ||
|
||
```ebnf | ||
BITDATA-DEFN ::= "bitdata" IDENT (":" TYPE)? "{" BITDATA-CONS-LIST* "}" | ||
``` | ||
|
||
This introduces the `bitdata` type with a name (the identifier), an optional | ||
carrier type, followed by a block of bitdata constructors. The carrier type | ||
is used as a substitution when regular data-types are needed. | ||
|
||
```ebnf | ||
BITDATA-CONS-LIST ::= BITDATA-CONS ("," BITDATA-CONS)* | ||
BITDATA-CONS ::= IDENT "{" BITFIELD-LIST "}" | ||
``` | ||
|
||
The bitdata constructors are all named constructors, each with bit-fields. A | ||
bit-field is either tag-bits or a labeled bit-field. Tag-bits are constant | ||
expressions (used for bit-field `match`), and labeled bit-fields are named | ||
bit-ranges. | ||
|
||
```ebnf | ||
BITFIELD-LIST ::= BITFIELD ("," BITFIELD)* | ||
BITFIELD ::= TAG-BITS | LABELED-BIT-FIELD | ||
TAG-BITS ::= BIT-LITERAL | ||
LABELED-BIT-FIELD ::= IDENT ( "=" CONST-EXPR )? ":" BITDATA-TYPE | ||
``` | ||
|
||
The valid bitdata-types are only other bitdata-types (by name) or else unsigned | ||
and signed bit-types like e.g. `u12`, and also floating-point value types. | ||
|
||
```ebnf | ||
BITDATA-TYPE ::= ("u" | "i") ('0'-'9')+ | "f32" | "f64" | IDENT | ||
BIT-LITERAL ::= INT-LITERAL ("u" | "i") ('0'..'9')+ | ||
``` | ||
|
||
## Limitations | ||
|
||
* Each constructor must have the exact same bit-size. | ||
* If the `bitdata` definition has a type specifier, all constructor bit-sizes must match this. | ||
* Tag-bits and labeled bit-fields with initializers act as discriminators, but they need | ||
not be exhaustive. | ||
|
||
## Construction | ||
|
||
```rust | ||
let addr = PCI { bus : 0, dev : 2, fun : 3 }; | ||
let tree = vec![ NodeX { left: 1, right: 2, split: 10.0 }, // 0 | ||
NodeY { left: 3, right: 0, split: 0.5 }, // 1 | ||
Leaf { tri0: 0, tri1: 1, tri2: 2 }, // 2 | ||
Leaf { tri0: 3, tri1: 4, tri2: 5 } ] // 3 | ||
``` | ||
|
||
## Bit-field access | ||
|
||
Access through the `.` operator is unchecked. In other words, this is valid | ||
|
||
```rust | ||
fn f(node : KdNode) -> f32 { node.NodeX.axis } | ||
``` | ||
|
||
If there is only one bit-constructor in the bit data, the constructor name may | ||
be elided: | ||
```rust | ||
fn bus(pci : PCI) -> u8 { pci.bus } | ||
``` | ||
|
||
## Matching | ||
|
||
Matching is not nescessarily exhaustive, as there may be "junk" values. For | ||
instance, | ||
```rust | ||
bitdata T { S { 0u5 }, N { 0b11111u5 } } | ||
``` | ||
Here `T` is 5-bits, but if the value is anything else than 0 or 31, it is | ||
considered "junk": | ||
```rust | ||
match t { S => "Zero", N => "Non-zero", _ => "Junk" } | ||
``` | ||
|
||
## Compared to `enum` | ||
|
||
The `bitdata` type is similar to the existing `enum` type with the following | ||
differences: | ||
|
||
* The discriminator is not added automatically. | ||
* All bit-data constructors must have the exact same bit-size. | ||
|
||
## Notes | ||
|
||
`bitdata` may help reduce some unsafe operations such as transmute. For instance, | ||
we can analyse a IEEE-754 value: | ||
|
||
```rust | ||
bitdata IEEE754 { | ||
F { value : f32 }, | ||
I { sign : u1, exp: u8, mant: u23 } | ||
} | ||
|
||
fn float_rep(f : f32) { | ||
let x = F { value : f }; | ||
println!("s:{}, e:{}, m:{}", x.I.sign, x.I.exp, x.I.mant) | ||
} | ||
``` | ||
|
||
# Alternatives | ||
|
||
It has been suggested to implement this a syntax extension. This will not | ||
work, because | ||
|
||
* We need significant error-checking, including bit-size calulations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you expand on why you don't think that a procedural macro would be able to provide this error checking? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the language had support for bitdata signed and unsigned values (like e.g. |
||
and overlapping tag checks | ||
* `bitdata` definitions may make use of other `bitdata` definitions | ||
* Syntactic overhead would be large | ||
* It is unclear how cross-module usage and type-checking would occur | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What exactly do you mean by this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For proper integration in the language, you would also need to import bitdata representations from other crates (e.g. to build a bitdata A using components from another bitdata B). I have a hard time seeing this being implemented using macros. |
||
|
||
# Drawbacks | ||
|
||
# Unresolved questions | ||
|
||
This RFC does not discuss endianess issues. It is assumed that the bit-fields | ||
are defined in target endianess. | ||
|
||
Also, we could support inline-arrays of bit fields, but that could be saved | ||
for a future implementation. For instance: | ||
```rust | ||
bitdata KdTree { | ||
// ... | ||
Leaf { tag = 3 : u2, _: u2, tri : [u20,..3] } | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to make it more similar to a struct of various bit lengths?
Since enums are usually one long value of a certain width which would make it prone to more endian issues (The whole order would be off)
As well as a numerical value (enum) would require more packing and unpacking to read\write as well as masking, where as a struct is at the least one byte aligned (So the compiler would only have to pack\unpack anything under a byte and anything that's not byte aligned (that is, not a u8\16\32\64\etc.,) )
Also, anything saved as a bitdata as an enum would have to be the biggest number value (so if you have 20 bits of data, you'd be writing a 32 bit number instead of a struct where 20 bits would round up to 24 bits (or an array of 3 bytes)) This helps with storage in extremely tight areas, but more importantly in serial\Microcontroller communication (unless you accommodate on the non-pc end) as sending more bytes than necessary will throw off the program... (Best solution i can think of for this is MIDI, MIDI Commands are sent 3 Bytes at a time (24 Bits), so a 4bit value would mess it up.