RFC: Proposal for bit data #205

engstad · 2014-08-18T05:20:29Z

Trying to get feedback on bit-data. I realize that it may not be a 1.0 priority, but discussion on this is still helpful.

sfackler · 2014-08-18T05:26:04Z

active/0000-bitdata.md

+It has been suggested to implement this a syntax extension. This will not 
+work, because
+
+* We need significant error-checking, including bit-size calulations


Could you expand on why you don't think that a procedural macro would be able to provide this error checking?

If the language had support for bitdata signed and unsigned values (like e.g. u4 and i12), I believe it would be possible to support it through procedural macros. Having said that, I'm not an expert on procedural macros, and honestly - if anyone can write it using this feature - I would love to see it.

bharrisau · 2014-08-18T08:47:31Z

We were discussing this for Zinc. I ended up suggesting a way to do it with a Rust plugin, but we ended up going a custom DSL for ease of use. hackndev/zinc#129 (comment)

bgamari · 2014-08-18T13:18:33Z

A few points,

Pull request RFC: bit fields and bit matching #29 makes a similar suggestion. The consensus here was that this was the territory of a syntax extension. It might be nice to elaborate on how this RFC differs from RFC: bit fields and bit matching #29 and address some of the concerns raised there.
As mentioned by @bharrisau, bitflags! wasn't sufficient to cover our needs (in part because it only supports flags, not arbitrary-width fields) in Zinc which prompted us to develop ioregs!, a syntax extension implementing a domain-specific language for describing registers and their fields. The documentation for this extension can be found here.
I'm having a bit of trouble seeing how to write platform independent definitions if endianness is ignored. Could you elaborate on how you might write a definition for, e.g., the TCP header?

bharrisau · 2014-08-18T21:14:36Z

Sightly different to strict 'endianness' is the bit numbering. Whether 0 is
MSB or LSB.

But this is doable in a syntax extension, albeit as a struct with functions
(unable to use fields to refer to bits until there is some sort of
metatable trait).

engstad · 2014-08-18T22:50:47Z

Thanks all for some very useful comments. I will address it in an upcoming diff. As @bharrisau mentioned, this proposal ignores endianess since it is irrelevant (the underlying byte, half-word, word, dword defines the byte order), but bit-numbering (LSB first or MSB first) should probably be added.

sgulgas · 2014-08-19T03:53:39Z

active/0000-bitdata.md

+
+## Compared to `enum`
+
+The `bitdata` type is similar to the existing `enum` type with the following


Wouldn't it be better to make it more similar to a struct of various bit lengths?

Since enums are usually one long value of a certain width which would make it prone to more endian issues (The whole order would be off)

As well as a numerical value (enum) would require more packing and unpacking to read\write as well as masking, where as a struct is at the least one byte aligned (So the compiler would only have to pack\unpack anything under a byte and anything that's not byte aligned (that is, not a u8\16\32\64\etc.,) )

Also, anything saved as a bitdata as an enum would have to be the biggest number value (so if you have 20 bits of data, you'd be writing a 32 bit number instead of a struct where 20 bits would round up to 24 bits (or an array of 3 bytes)) This helps with storage in extremely tight areas, but more importantly in serial\Microcontroller communication (unless you accommodate on the non-pc end) as sending more bytes than necessary will throw off the program... (Best solution i can think of for this is MIDI, MIDI Commands are sent 3 Bytes at a time (24 Bits), so a 4bit value would mess it up.

bharrisau · 2014-08-19T04:11:26Z

As this will likely end up being "please try implementing this as a syntax extension", I'll copy in what the syntax extension may look like from my other post.

#[bitdata(u64)]
pub struct NodeX {
  #[bits(0, 1)]   axis:   u8,
  #[bits(2, 16)]  left:  u16,
  #[bits(17, 31)] right: u16,
  #[bits(32, 63)] split: f32
}

let left = foo.left();
let right = foo.right();

foo.set_left(5);

I don't think you can get virtual struct fields through any method yet (a metatable trait would be needed, or a significant upgrade to Index). So you are unable to de-structure the struct, or treat the fields as real fields.

lilyball · 2014-08-20T20:10:25Z

This RFC does not explain what happens if you use a bitfield in a match when there is no suitable discriminators on the various identifiers. I think the simplest rule is that when using match on a bitfield, all cases must consider the same bits to be a discriminant, and each case must have a unique discriminant set. This could be modified slightly to allow for one case to define extra discriminants that aren't actually necessary to distinguish it from the other cases, but that may be an overcomplication.

nikomatsakis · 2014-08-23T04:22:31Z

This is a well-written and reasonably thorough RFC, which I greatly appreciate. However, while it would be convenient for certain use cases, I don't really see us adding a feature like this in the short term. This is basically because we are focused on building up the language in other areas (e.g., rounding out smart pointer design and type machinery). It is also somewhat unclear how widely applicable this feature would be.

There are some pieces of the design that seem surprising to me and for which I would like to see more justification:

The two-level namespaces (value.Variant.field) has no precedent elsewhere in Rust.
It seems surprising that you do not need to match on the discriminant to extract out data. Is this simply because you think it will be too inconvenient/slow, or are there legitimate cases where you wish to purposefully ignore the discriminant? If the latter, it seems like that should be an explicit operation.
The design seems to serve two use cases simultaneously. You can leave off discriminants and get something like C unions and also include discriminants. It's not clear to me that these things should be combined.

I guess that the last two points can be put another way: the presence of discriminants makes this feature "feel" typesafe, but the usage is very low-level and doesn't actually enforce any sort of invariant (i.e., that you access fields only when the discriminant has a suitable value). Maybe those sorts of invariants and type-level machinery are overkill when you're dealing in bits rather than larger values, but that is not immediately clear to me.

engstad · 2014-08-23T19:02:07Z

@kballard My thought was that if there are two arms with the same discriminant bits, then the compiler should warn (or error) on the second one, since it is a match arm that can't be reached.

@nikomatsakis As I mentioned above - no, I don't expect this in Rust any time soon. Having said that, I do believe quite a bit of code could be written in a much safer manner using this machinery, especially code that deals with hardware (like e.g. micro-controllers, graphics chips, device drivers), but also code that rely on communicating data at the bit-level. I think even rustc could benefit, for instance in terms of optimizing space usage in its internal data-tables.

I will work more on the proposal, but in terms of your points. The two-level namespace is not strictly needed since you can always introduce new variables and match or let-match it. However, that's quite a bit of boilerplate code. Imagine having a bitdata arm with 16 fields and you are only interested in that "arity" bit in the end.

The reason that both use-cases (C unions and Rust-ish enums) are in the proposal is flexibility. Recall that we are just talking about describing the bits of (for instance) a u16/u32/u64/u128 etc. value. It then seems like overkill to restrict it too much. You can always just operate on those bits using shifts and masks, but that is error-prone and tedious and is exactly what we are trying to avoid with this RFC.

Proposal for bit data

a8c51d3

sfackler reviewed Aug 18, 2014
View reviewed changes

sgulgas reviewed Aug 19, 2014
View reviewed changes

engstad closed this Aug 23, 2014

bgamari mentioned this pull request Sep 28, 2014

RFC: Proposal for bit data (Ver 2) #327

Closed

kennytm mentioned this pull request Aug 30, 2021

Proposal: compressed layout of enum #3166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Proposal for bit data #205

RFC: Proposal for bit data #205

engstad commented Aug 18, 2014

sfackler Aug 18, 2014

engstad Aug 18, 2014

bharrisau commented Aug 18, 2014

bgamari commented Aug 18, 2014

bharrisau commented Aug 18, 2014

engstad commented Aug 18, 2014

sgulgas Aug 19, 2014

bharrisau commented Aug 19, 2014

lilyball commented Aug 20, 2014

nikomatsakis commented Aug 23, 2014

engstad commented Aug 23, 2014


		## Compared to `enum`

		The `bitdata` type is similar to the existing `enum` type with the following

RFC: Proposal for bit data #205

RFC: Proposal for bit data #205

Conversation

engstad commented Aug 18, 2014

sfackler Aug 18, 2014

Choose a reason for hiding this comment

engstad Aug 18, 2014

Choose a reason for hiding this comment

bharrisau commented Aug 18, 2014

bgamari commented Aug 18, 2014

bharrisau commented Aug 18, 2014

engstad commented Aug 18, 2014

sgulgas Aug 19, 2014

Choose a reason for hiding this comment

bharrisau commented Aug 19, 2014

lilyball commented Aug 20, 2014

nikomatsakis commented Aug 23, 2014

engstad commented Aug 23, 2014