-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range types for integers (or refinement types?) #671
Comments
Ada has this. It would be great in Rust, as we're going for the same niche of low-level + high-assurance. I wonder though if we should go for a more general system, such as refinement types or pre/post-conditions. There have been a number of successes bolting these onto languages like Haskell, F#, and C#. AIUI, they manage this without heavy changes to the core language. The condition checker is a mostly-separate tool that gathers proof obligations from the source code and passes them to a SMT solver. Basically I think this is an area where we should give researchers and other interested parties some time to experiment before we standardize something into Rust itself. |
Does Ada check the ranges at runtime or at compile time? I don't think we need a special syntax for this. Here is a toy implementation of an integer that checks the range at runtime: use std::num::Int;
#[derive(Copy,Debug)]
struct RangedInt<T: Int> {
value: T,
min: Option<T>,
max: Option<T>,
}
impl<T: Int> RangedInt<T> {
fn checked_add(self, other: Self) -> Option<Self> {
let sum = match self.value.checked_add(other.value) {
None => return None,
Some(i) => i,
};
match self.min {
Some(i) if sum < i => return None,
_ => (),
};
match self.max {
Some(i) if sum > i => return None,
_ => (),
};
Some(RangedInt { value: sum, min: self.min, max: self.max })
}
}
fn main() {
let a = RangedInt::<i64> { value: 5, min: Some(0), max: Some(10) };
let b = RangedInt::<i64> { value: 9, min: Some(0), max: Some(10) };
let ok = a.checked_add(a);
let fail = a.checked_add(b);
println!("ok: {:?}", ok);
println!("fail: {:?}", fail);
} yielding
|
@vks: I believe Ada's semantics are those of run-time checking, but the compiler can elide checks when it proves they can't fail. I think this is a pragmatic approach to range checking. You can use the feature today, without understanding a complicated system for getting through the compiler. The code may be slow, but it will get faster over time as the compiler gets smarter, with zero effort on your part. It sounds nice anyway. Another approach is to insert dynamic range checks in debug builds only. This fits nicely with the new integer overflow semantics. Release builds would just prepare a report of which ranges were not verified statically. This lets you target unchecked ranges as a code metric to reduce over time. At any rate, I agree that we don't need special syntax for range types. All we need is to allow macro invocations in type expressions. Then fn foo(x: range!(0..10)) { will work whether |
@kmcallister I would much rather have generics over values. Something like |
The issue with generic types or macros is that rust cannot automatically modify the range when it learns something new about a variable. let mut(0,1000) x = 0i;
if (x < 100) { return; }
// range of x is now (100..1000) let (0,1000) x = 0i;
let (0, 1000) y = ...;
let z = x / y;
// range of x is now (1..1000) as otherwise the division would have panicked. |
You can if you define the operators in terms of On Fri, Mar 6, 2015, 14:40 Oliver Schneider [email protected]
|
Not if you implement it with [not yet available] generic values. Sidenote: This is actually similar to typestate. |
I think you can implement it as of now, with checks at runtime though. On Fri, Mar 6, 2015, 16:22 Oliver Schneider [email protected]
|
Definitely check out Sage and the associated papers. Their system uses a hybrid of traditional type checking/inference, a theorem-proving black box, and dynamic checking as a fall-back. In 1,200 lines of (ML-syntax) test code, they have 5,087 subtyping judgements, of which 99.7% are discharged at compile time. One of the Sage examples is a binary search tree whose type includes the range of valid keys, which is used to guarantee that the tree stays in search order. In Rust-like syntax: fn Range(lo: isize, hi: isize) -> type {
those x: isize where lo <= x && x < hi
}
enum BiTree(lo: isize, hi: isize) {
Empty,
Node {
mid: Range(lo, hi),
left: BiTree(lo, mid),
right: BiTree(mid, hi),
},
}
impl BiTree(lo: isize, hi: isize) {
fn search(&self, v: Range(lo, hi)) -> bool {
match *self {
BiTree::Empty => false,
BiTree::Node { ref mid, ref left, ref right } => {
match v.cmp(mid) {
Ordering::Less => left.search(v),
Ordering::Equal => true,
Ordering::Greater => right.search(v),
}
}
}
}
fn insert(&self, v: Range(lo, hi)) -> BiTree(lo, hi) {
match *self {
BiTree::Empty => BiTree::Node {
mid: v,
left: BiTree::Empty,
right: BiTree::Empty,
}
BiTree::Node { ref mid, ref left, ref right } if v < *mid => {
BiTree::Node {
mid: mid,
left: left.insert(v),
right: right,
}
}
BiTree::Node { ref mid, ref left, ref right } => {
BiTree::Node {
mid: mid,
left: left,
right: right.insert(v),
}
}
}
}
} Note that in the dependently-typed setting, there is no need to separate type parameters and value parameters. I have taken various liberties with the syntax, particularly the treatment of implicit function arguments. The syntax those x: isize where lo <= x && x < hi represents a refinement type, equivalent to the more traditional
Refinements on Making Rust dependently typed would be a huge step. But it's a huge step that provides integer generics, variadic generics, refinement types, compile-time function evaluation, and lots of other goodies, in a single unified framework. Potentially it could subsume a lot of the existing compile-time machinery, as well. cc @bstrie |
The Const generics"-RFC will allow types that are generic over an integer parameter. This will probably be useful for defining ranged types in a library. |
Instead of this syntax:
I like:
Or even:
|
Once const-generics is completely implemented, const parameters can refer to type parameters and we are able to make bounds based off of const parameters, we will ideally be able to do this as a library like so, struct Ranged<T: Ord, const LOW: T, const HIGH: T>(T) where LOW <= HIGH; Right now on nightly we can do struct RangedU32<const LOW: u32, const HIGH: u32>(u32); But this will not allow the compiler to take advantage of the unused-space like |
There's a list of various important features that library-defined intervals can't support. So I think they need to be built-ins. |
Is there an RFC for something along the lines of what @KrishnaSannasi said? This could absolutely allow the compiler to optimize space, and could halve the size of certain structs. If not, I'll look into what's actually necessary for a proper RFC. |
@jhpratt I am not aware of any RFC into this, but I think it would probably be best to hold off on it until const generics is complete on nightly. |
@KrishnaSannasi why do you think pipelining (working in parallel) on such things is a bad idea? |
@leonardo-m I think that bounded range types should start off as a crate so we can try out different apis, and we can't really do that right now (no expression handling in const generics). |
@KrishnaSannasi One of the key advantages of range bounded integers is the ability for smaller representation, which isn't possible without rustc integration. |
@jhpratt but the api we use to work with the bounded integer types don't depend on the compiler, so they can be worked on as a seperate crate. Also, with complete expressions in const generics, you could implement the space optimizations without compiler integration just by using Rust's powerful type system. Finally, bounded integers are useful even without that optimisation, so it would be fine to just have a crate. |
Yes, most of the API would be possible to implement as a crate, though treating it as a primitive would not be. Ideally, I'd be able to do How would space optimization be enabled? It's certainly useful, and something I may look into implementing to an extent myself at some point. It's just that it would be far more useful to have it built-in. |
Yes, this is a valid reason to merge an external crate into std. But this is not a sufficient reason to start it in std.
If there were runtime checks for
We could round up to the nearest byte (which would already be a large space saving) and use traits like those in
You can't specialize layouts like this without compiler integration. One thing about specializing layouts that I didn't see brought up. It would make multiplication and division much more costly. Assuming that you are only storing the offset from the lower bound, every multiplication and division would have an additional addition folded in to offset back to the correct value. Is that worth the space savings? Is there some clever way to not do this? Shifting back may not be possible, because the multiplication may overflow I bring this up because it doesn't look like anyone has discussed this here. Also, how do other languages implement this? Do any other languages implement this as a library? |
Note that this is also true for other things like |
If rust would adopt refinement types for example over where constraints, would it guarantee for a decidable fragment? |
Is anything like this in the language yet? |
It's probably best to experiment at the library level first, and that in turn requires const generics if you're not in the mood for fighting typenum. |
We should make a list of all the things a library defined ranged value can't do compared to a well designed built-in feature. I think the number of sub-features that should to be build-in is limited, but they are important enough to make them worth using often. |
We'll eventually want compiler plugins (forks?) for various "slow" static analysis task, including refinement types, but presumably only after chalk. It'd also help prevent them from interacting with soundness. Refinements must not touch dispatch obviously. If refinements tried influencing dispatch, then they'd wind up inside some much longer purgatory than the mere 5 years required by specialization. ;) An optional compile phase that rejects some programs when called, but cannot modify behavior sounds cleanest. And developers would prefer such phases being invoked only by CI. |
only by CI? |
Just for anyone that may be thinking of doing the same thing — I intend to begin work on a basic implementation of ranged integers on nightly. With min_const_generics looking likely to stabilize in February 2021, it's probably best to get out ahead of it. |
@jhpratt Great to hear it! I look forward to seeing how it goes. The piece I keep looking forward to, though -- |
Yeah, there will be some explicit |
For the curious, I recently started working on that. Right now all there exists is definitions of various ranged types and their respective Out of curiosity, I tried using If you'd like to follow along and/or contribute, I'll be periodically pushing code up to jhpratt/deranged. I haven't yet published anything on crates.io, though I have reserved the name. |
I'd love if we could find some solution for this that allows literals to be checked against the range of arbitrary types. |
The upcoming version of ux uses a method to achieve something like ranged integers on stable Rust: It exposes numeric types (eg. the 28 bit unsigned integer This has a few downsides:
Back when this was started, we didn't have many of the tools we have now. Maybe it's time to revisit. To give a straw man proposal, the core library could expose (for each numeric type, using u16 as stand-in):
with the appropriate TryFrom, Into and and a |
If we get variable-width numbers, they should be compiler built-ins. Otherwise we're inviting the C++ debug performance and compile time calamity. |
I agree; the ux is not a suggestion how to do this, but mentioned to illustrate that there is a need and that workarounds are being used. But they don't have to look like compiler built-ins, they can behave like any other type (just with the difference that the compiler itself may have better tools to do that). The original post's proposal looked very built-in -- but back then we had no const generics to do any better. |
It seems #76560 could enable this. As @scottmcm said:
And we really only need basic arithmetic in const generics, which is exactly what the goal of that initiative is. On the topic of ergonomics: If we go for maximum correctness, I would assume that any constant and literal of value const a = 6;
// Equivalent to
const a: RangedI32<6, 6> = 6;
const b: RangedU64<6, 6> = 6;
const c: RangedU8<6, 6> = 6; Now, if you use constants or literals in any sort of arithmetic, the same thing would apply, we talked about that before: const a = 6;
const b = 12;
// All the type annotations could be omitted as well
const sum: RangedI32<18, 18> = a + b;
const product: RangedI32<72, 72> a * b;
const div: RangedI32<2, 2> = b / a; And so on. Extending to actual ranged values: fn do_safe_arithmetic(a: RangedU8<30, 100>, b: RangedU8<4, 20>, factor: RangedU8<0, 2>) -> RangedU8<0, 192> {
let z = a - b; // Type would be `RangedU8<10, 96>`
z * factor
} The compiler can now prove that some operations are always valid and can never overflow or underflow. Additionally, it could also give a warning or an error if something overflows: fn do_dangerous_arithmetic(a: RangedU8<30, 100>, b: RangedU8<4, 20>, divisor: RangedU8<10, 120>) -> RangedU8<1, 200> {
let z = a * b; // Type would be `RangedI8<120, 2000>`, which is of course not possible
// ^^^^^
// |
// Maximum result of calculation would be 2000, which is more than 255, the highest representable value
// help: Try casting to RangedU16: [...]
z / divisor
} But here you can start to see the problem. If I were to follow the logical suggestion and cast the operands of the multiplication, the resulting function is awfully verbose: fn do_arithmetic_proper(a: RangedU8<30, 100>, b: RangedU8<4, 20>, divisor: RangedU8<10, 120>) -> RangedU8<1, 200> {
let z = (a as RangedU16<30, 100>) * (b as RangedU16<4, 20>); // Type would now be `RangedU16<120, 2000>`, which is fine
(z / divisor) as RangedU8<1, 200>
} ( Maybe this is just the price we have to pay for panic-safe arithmetic, but it feels icky. I'd rather have fn do_arithmetic_proper(a: u8<30, 100>, b: u8<4, 20>, divisor: u8<10, 120>) -> u8<1, 200> {
let z = (a as u16) * (b as u16); // Implicit types would be `z: u16<120, 2000>`, `a: u16<30,100>`, `b: u16<10,120>`
(z / divisor) as u8
} (when using I concede that this is kind of magical, but I feel like this is so much more readable, and would make the whole feature so incredibly useful and nice to adopt, it would be worth the squeeze. Basically you'd get on-demand, zero-cost (except compile time) overflow safety. |
Basically you'd get on-demand, zero-cost (except compile time)
overflow safety.
That's a good goal, but I don't think we should wait for that to have
ranged integers. A ranged type would be valuable for storage already if
it has no arithmetic functions, and all it can do is `.try_from()` and
`.into()` its base type. Autochecked arithmetic can still be added to
those types when const generics mature.
|
I've played around with |
The work on range/pattern types is happening here: rust-lang/rust#107606 (gigantic PR +1,210 −237 is written) 🎉 |
Issue by BryanQuigley
Saturday Dec 13, 2014 at 03:46 GMT
For earlier discussion, see rust-lang/rust#19801
This issue was labelled with: A-an-interesting-project, E-hard in the Rust repository
It seems like a natural extension of how variables (immutable by default, mutable if specified) are defined to allow the programmer to dictate a specific range of allowed values for an integer. If I know a value is only valid between 0-1000 the sooner I declare that the better it is for catching bugs, off by one errors, and more...
I'm not sure what exact syntax would work, maybe:
x is only valid from 0-1000 inclusive.
(Apologies if this is already possible, I've been parsing the docs trying to learn Rust.)
The text was updated successfully, but these errors were encountered: