Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Proposal for bit data #205

Closed
wants to merge 1 commit into from
Closed

Conversation

engstad
Copy link

@engstad engstad commented Aug 18, 2014

Trying to get feedback on bit-data. I realize that it may not be a 1.0 priority, but discussion on this is still helpful.

It has been suggested to implement this a syntax extension. This will not
work, because

* We need significant error-checking, including bit-size calulations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on why you don't think that a procedural macro would be able to provide this error checking?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the language had support for bitdata signed and unsigned values (like e.g. u4 and i12), I believe it would be possible to support it through procedural macros. Having said that, I'm not an expert on procedural macros, and honestly - if anyone can write it using this feature - I would love to see it.

@bharrisau
Copy link

We were discussing this for Zinc. I ended up suggesting a way to do it with a Rust plugin, but we ended up going a custom DSL for ease of use. hackndev/zinc#129 (comment)

@bgamari
Copy link

bgamari commented Aug 18, 2014

A few points,

  1. Pull request RFC: bit fields and bit matching #29 makes a similar suggestion. The consensus here was that this was the territory of a syntax extension. It might be nice to elaborate on how this RFC differs from RFC: bit fields and bit matching #29 and address some of the concerns raised there.
  2. As mentioned by @bharrisau, bitflags! wasn't sufficient to cover our needs (in part because it only supports flags, not arbitrary-width fields) in Zinc which prompted us to develop ioregs!, a syntax extension implementing a domain-specific language for describing registers and their fields. The documentation for this extension can be found here.
  3. I'm having a bit of trouble seeing how to write platform independent definitions if endianness is ignored. Could you elaborate on how you might write a definition for, e.g., the TCP header?

@bharrisau
Copy link

Sightly different to strict 'endianness' is the bit numbering. Whether 0 is
MSB or LSB.

But this is doable in a syntax extension, albeit as a struct with functions
(unable to use fields to refer to bits until there is some sort of
metatable trait).

@engstad
Copy link
Author

engstad commented Aug 18, 2014

Thanks all for some very useful comments. I will address it in an upcoming diff. As @bharrisau mentioned, this proposal ignores endianess since it is irrelevant (the underlying byte, half-word, word, dword defines the byte order), but bit-numbering (LSB first or MSB first) should probably be added.


## Compared to `enum`

The `bitdata` type is similar to the existing `enum` type with the following
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to make it more similar to a struct of various bit lengths?

Since enums are usually one long value of a certain width which would make it prone to more endian issues (The whole order would be off)

As well as a numerical value (enum) would require more packing and unpacking to read\write as well as masking, where as a struct is at the least one byte aligned (So the compiler would only have to pack\unpack anything under a byte and anything that's not byte aligned (that is, not a u8\16\32\64\etc.,) )

Also, anything saved as a bitdata as an enum would have to be the biggest number value (so if you have 20 bits of data, you'd be writing a 32 bit number instead of a struct where 20 bits would round up to 24 bits (or an array of 3 bytes)) This helps with storage in extremely tight areas, but more importantly in serial\Microcontroller communication (unless you accommodate on the non-pc end) as sending more bytes than necessary will throw off the program... (Best solution i can think of for this is MIDI, MIDI Commands are sent 3 Bytes at a time (24 Bits), so a 4bit value would mess it up.

@bharrisau
Copy link

As this will likely end up being "please try implementing this as a syntax extension", I'll copy in what the syntax extension may look like from my other post.

#[bitdata(u64)]
pub struct NodeX {
  #[bits(0, 1)]   axis:   u8,
  #[bits(2, 16)]  left:  u16,
  #[bits(17, 31)] right: u16,
  #[bits(32, 63)] split: f32
}

let left = foo.left();
let right = foo.right();

foo.set_left(5);

I don't think you can get virtual struct fields through any method yet (a metatable trait would be needed, or a significant upgrade to Index). So you are unable to de-structure the struct, or treat the fields as real fields.

@lilyball
Copy link
Contributor

This RFC does not explain what happens if you use a bitfield in a match when there is no suitable discriminators on the various identifiers. I think the simplest rule is that when using match on a bitfield, all cases must consider the same bits to be a discriminant, and each case must have a unique discriminant set. This could be modified slightly to allow for one case to define extra discriminants that aren't actually necessary to distinguish it from the other cases, but that may be an overcomplication.

@nikomatsakis
Copy link
Contributor

This is a well-written and reasonably thorough RFC, which I greatly appreciate. However, while it would be convenient for certain use cases, I don't really see us adding a feature like this in the short term. This is basically because we are focused on building up the language in other areas (e.g., rounding out smart pointer design and type machinery). It is also somewhat unclear how widely applicable this feature would be.

There are some pieces of the design that seem surprising to me and for which I would like to see more justification:

  1. The two-level namespaces (value.Variant.field) has no precedent elsewhere in Rust.
  2. It seems surprising that you do not need to match on the discriminant to extract out data. Is this simply because you think it will be too inconvenient/slow, or are there legitimate cases where you wish to purposefully ignore the discriminant? If the latter, it seems like that should be an explicit operation.
  3. The design seems to serve two use cases simultaneously. You can leave off discriminants and get something like C unions and also include discriminants. It's not clear to me that these things should be combined.

I guess that the last two points can be put another way: the presence of discriminants makes this feature "feel" typesafe, but the usage is very low-level and doesn't actually enforce any sort of invariant (i.e., that you access fields only when the discriminant has a suitable value). Maybe those sorts of invariants and type-level machinery are overkill when you're dealing in bits rather than larger values, but that is not immediately clear to me.

@engstad
Copy link
Author

engstad commented Aug 23, 2014

@kballard My thought was that if there are two arms with the same discriminant bits, then the compiler should warn (or error) on the second one, since it is a match arm that can't be reached.

@nikomatsakis As I mentioned above - no, I don't expect this in Rust any time soon. Having said that, I do believe quite a bit of code could be written in a much safer manner using this machinery, especially code that deals with hardware (like e.g. micro-controllers, graphics chips, device drivers), but also code that rely on communicating data at the bit-level. I think even rustc could benefit, for instance in terms of optimizing space usage in its internal data-tables.

I will work more on the proposal, but in terms of your points. The two-level namespace is not strictly needed since you can always introduce new variables and match or let-match it. However, that's quite a bit of boilerplate code. Imagine having a bitdata arm with 16 fields and you are only interested in that "arity" bit in the end.

The reason that both use-cases (C unions and Rust-ish enums) are in the proposal is flexibility. Recall that we are just talking about describing the bits of (for instance) a u16/u32/u64/u128 etc. value. It then seems like overkill to restrict it too much. You can always just operate on those bits using shifts and masks, but that is error-prone and tedious and is exactly what we are trying to avoid with this RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants