Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Enums as State Machines #1906

Closed
mark-i-m opened this issue Feb 17, 2017 · 23 comments
Closed

Idea: Enums as State Machines #1906

mark-i-m opened this issue Feb 17, 2017 · 23 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@mark-i-m
Copy link
Member

The problem of encoding some sort of state machine is common for many programs, such as building parsers, schedulers, protocols, or coroutines. This proposal extends enums to allow them to express state machines.

Enums would be extended with extra syntax as follows. The design is intended not to break any existing code. Enums would retain all of the features they currently have, but they would also have extra abilities. For example

pub enum MyStates {
    // The `=> State2` denotes that an enum with value `State1` can only
    // have its value changed to `State2`.
    State1(u32) => State2,              

    // A variant can have multiple successor states
    State2 => {State3, State4},
    State3(IpAddr) => {State4, Error},

    // A variant can decide to transition to itself, which means that once
    // the state machine gets into state `Error`, it is stuck there.
    Error => Error,

    // A variant without explicit next states implicitly can transition to any
    // other variant/state. This makes sure that current code does not break.
    State4,
}

I think it should be relatively easy to enforce these constraints in safe code statically with little or no non-local reasoning. (Not 100% sure about this, though... Can someone find a counterexample?). For example,

pub fn advance_fsm(fsm: &mut MyState) {
    match fsm {
        &mut MyState::State1(i) => {
            send_msg(i);
            
            // This assignment can ONLY be to `State2`
            *fsm = MyState::State2;           
        }
        &mut MyState::State2 => {
            // COMPILE ERROR: Illegal state transition: State2 -> State1
            *fsm =  MyState::State1;      
        }
        &mut MyState::State3(ip) => if connect(ip) {
            *fsm = MyState::State4;
        } else {
            *fsm = MyState::Error;
        },
        &mut MyState::State4 => *fsm = MyState::State1,
        &mut MyState::Error => {}
    }
}
@Manishearth
Copy link
Member

It makes more sense to use session types here

https://github.com/Munksgaard/session-types

@mgattozzi
Copy link
Contributor

Some previous work based on state machines that might be worth a look
https://hoverbear.org/2016/10/12/rust-state-machine-pattern/

@mark-i-m
Copy link
Member Author

Thanks! I will take a look those links :)

@mark-i-m
Copy link
Member Author

@mgattozzi The "State Machine Patterns" article you posted is interesting. In general, I like the idea of using the type system to encode program constraints and enforce them statically (after all, what else are type systems for :P).

Interestingly, at the end of the article, the author says

In an ideal situation we’d be able to make enums with restricted transitions between variants, but that’s not the case.

which happens to be this proposal... I would also argue that my proposal is more ergonomic and more easily readable, too.

Also, I am still looking at the session types reading...

@glaebhoerl
Copy link
Contributor

I wonder how this relates to #1450. They are similar in that both of them have this aspect of certain operations only being available in contexts where you (or rather, the typechecker) know which variant the enum currently "contains". That is, the typechecker knows more facts about the types of things when it's inside of a match arm than out. (Kind of like GADTs.)

@TedDriggs
Copy link

Rather than building a special-purpose syntax for this, have you looked at the proposals around enum refinement? That would let you define From implementations or custom methods that moved between states in a typesafe fashion and would be usable for other things besides state machines.

@mark-i-m
Copy link
Member Author

@glaebhoerl

the typechecker knows more facts about the types of things when it's inside of a match arm than out.

Yes, I think that is the more general observation that makes my proposal feasible. I would be very interested in knowing what other ways these sorts of situations can be useful... Actually, if we could find a way of exposing this sort of extra knowledge to the user, I think we could enable all sorts of really cool static guarantees about code. In other words, we would be allowing the user to specify constraints that the compiler can already prove, so there would be little extra overhead in the compiler but the user would get extra guarantees.

@TedDriggs I believe you are referring to something similar to this post mentioned above. While I think the design is nice, I think that ultimately, it is a way of getting around existing limitations. This proposal would make state machines more compact and easier to read...

@mark-i-m
Copy link
Member Author

@Manishearth I finally got around to reading a bit about session types. It seems like session types are more geared at communication protocols. In particular, it seems like the main offering of session types that is not present in this proposal is that session types allow you to define the relationship between to two state machines more clearly (i.e. state machine A can only make certain transitions after a specific set of transitions by state machine B). I would be down to augment this proposal with such a feature, if people feel it would be really useful. It does seem like something that could be added afterwards, though.

@Manishearth
Copy link
Member

You misunderstand me, session types are already expressible in Rust, I'm suggesting that if you need to express complex state machines then session types may help, instead of shoehorning stuff onto the enum system.

@mark-i-m
Copy link
Member Author

Ah, I see.

I disagree, though. Most things can be expressed with Rust's type system if you put in enough effort, since it is Turing complete, IIRC. That doesn't make it convenient or elegant to do so, though. For example, the session types library you pointed out is expressive, but it often forces you to use types like this:

// Offers: Add, Negate, Sqrt, Eval
type Srv =
    Offer<Eps,
    Offer<Recv<i64, Recv<i64, Send<i64, Var<Z>>>>,
    Offer<Recv<i64, Send<i64, Var<Z>>>,
    Offer<Recv<f64, Choose<Send<f64, Var<Z>>, Var<Z>>>,
    Recv<fn(i64) -> bool, Recv<i64, Send<bool, Var<Z>>>>>>>>;

This sort of thing makes me wince a bit. Also, type-check can be exponential time for some of these more unusual uses of the type system.

Also, I would argue that the enum representation still seems a bit more compact than the session types library.

Finally, I didn't think that I had really "shoehorned" that much. Enums are a very natural and intuitive way to express different states, and adding transitions relations onto them seems like a logical addition to me, but maybe that's just me 😛

@Manishearth
Copy link
Member

Most things can be expressed with Rust's type system if you put in enough effort, since it is Turing complete, IIRC.

That's incorrect. Turing completeness gets you the ability to run arbitrary computations in the type system, nothing more. Turing completeness is but one small facet in the world of expressiveness. The goal of type systems is not to do computation, it is to express and enforce constraints, and expressiveness is viewed through that lens.

It's a common fallacy, but basically Turing completeness doesn't actually mean much in many contexts where it gets invoked.

For example, the session types library you pointed out is expressive, but it often forces you to use types like this:

Right, because it's a pretty generic library. For a regular state machine you can just use some phantom types and express these constraints with a limited set of conversion methods. Hyper does this (or used to), and it's pretty easy overall.

Finally, I didn't think that I had really "shoehorned" that much. Enums are a very natural and intuitive way to express different states, and adding transitions relations onto them seems like a logical addition to me, but maybe that's just me

Your proposal reintroduces typestate; the dependence of the type and its behavior on the contained value. Typestate is an interesting concept (which Rust had many years ago) but probably not one that fits well into Rust now -- most use cases of typestate can be adequately handled with phantom parameters. It also introduces typestate for a very specific situation, enums-as-state-machines, and thus feels very bolted-on/shoehorned to me.

If this were to happen I think the more natural way would be through #1450, where you can then hang conversion methods off the type. But mind you, you can do that with phantom (or nonphantom) types now anyway, kind of.

@mark-i-m
Copy link
Member Author

mark-i-m commented Apr 7, 2017

That's incorrect. Turing completeness gets you the ability to run arbitrary computations in the type system, nothing more. The goal of type systems is not to do computation, it is to express and enforce constraints, and expressiveness is viewed through that lens.

Ah, I see my mistake. Thanks for correcting me!

But the spirit of my comment still holds. My point was just that we often want to introduce sugar for something extremely useful, even if it is technically already possible. For example, technically, rust is able to express closures without the |x| {...} syntax, but we include that syntax because closures are common and useful enough to warrant some convenient sugar. I argue that state machines are another such case (though admittedly not as common as closures), but ...

It also introduces typestate for a very specific situation, enums-as-state-machines, and thus feels very bolted-on/shoehorned to me.

I think we simply disagree on how generic/useful state machines are. I also don't have a real grasp of the implications of typestate, and I also don't want to bloat/complicate the language. So my inclination is to close the issue if nobody else is in favor of it...

most use cases of typestate can be adequately handled with phantom parameters.

I am not sure I understand this point. Is the session-types library an example of this?

@Manishearth
Copy link
Member

I think we simply disagree on how generic/useful state machines are.

No, this has nothing to do with that -- I do feel that state machines are pretty useful. I'm saying that the solution proposed here feels bolted-on. Enum-types feel like a more holistic solution, and they are useful for many other things.

FWIW there are plans for a JS generator-esque thing with closure-like syntax and yield compiling to state machines. Doesn't exactly address this use case -- the state machine there is under the hood and can't be manipulated directly -- but it may help fix some issues.

I am not sure I understand this point. Is the session-types library an example of this?

Not exactly.

@mark-i-m
Copy link
Member Author

mark-i-m commented Apr 8, 2017

FWIW there are plans for a JS generator-esque thing with closure-like syntax and yield compiling to state machines.

Yes, it was #1823 that first got me thinking about this...

@burdges
Copy link

burdges commented Apr 14, 2017

I believe restricting transitions is too inflexible a restriction. In particular enums tend towards immutable usage far more heavily than any other data structure.

We've previously discussed subtyping relationships among enums, which sounds vastly more useful, and subsumes the transitions proposal here with some boilerplate. Just one syntax might be :

pub enum MyStates {
    State1(u32),
    State2,
    State3(IpAddr),
    Error,
}

pub enum MyStartStates = MyStates { State2 | State3(..) };
pub enum MyEndStates = MyStates { State2 | Error };

These statically subtyped enums may encode aspects of the state machines' transitions even when used immutably.

impl MyStates {
    pub fn new() -> MyStartStates { ... }
    pub fn start(self: MyStartStates) -> MyStates { ... }
    pub fn step(self: MyStates, cmd: &str) -> MyStates { ... }
    pub fn end(self: MyStates) -> MyEndStates { ... }
}

I think those discussions considered extending enum as well, like maybe

pub enum MyExtraStates<'a> = MyExtraStates { * } + MyNewStates {
    State4, 
    State5(Cow<'a, str>)
};

@mark-i-m
Copy link
Member Author

Hmmm... your comment reminds me of the Union types in Pony. It seems that that was the direction #1450 was taking, too (that's what untagged unions are, right?). And indeed, that does strike me as a more general solution than either my proposal or session types.

@burdges
Copy link

burdges commented Apr 15, 2017

I've run into a situation where subtyping or extending enums would help clarify (the input to) a state machine just this week, btw. ;)

@burdges
Copy link

burdges commented Apr 17, 2017

There was also an idea that linear types could provide types for which you must manually call a destructor. I suppose those ideas (a) passed through #1180 before becoming the much less radical ManuallyDrop trait, and also (b) became the current interest in owned references. In principle Drop could be implemented differently for different subtypes of an enum. And a DoNotDrop trait could statically force the state machine into certain ending states. Anything that deep likely depends upon owned references though.

@mark-i-m
Copy link
Member Author

Hmmm... that's like a bit of a round-about way of defining a set of accepting states in an NFA, though. I would still rather that there was some sugar...

But on the whole, the discussion here has convinced me that it would probably be good to just wait and see what new type-system features come about...

@burdges
Copy link

burdges commented Apr 18, 2017

There was a less ergonomic but more powerful approach suggested in trait field's RFC in which a trait could act like an enum :

trait MyStartStates enum {
    State2,
    State3(IpAddr),
}
impl MyStartStates for MyStates {
    State2 = MyStates::State2,
    State3(IpAddr) = MyStates::State3(IpAddr),
}

@mark-i-m
Copy link
Member Author

mark-i-m commented Sep 2, 2017

I think that #2033 solves this mostly...

@Yanpas
Copy link

Yanpas commented Oct 5, 2017

The idea is nice, but I don't think it's usage would be popular. So in purpose of not overcomplicating the language I suggest adding mandatory #[derive(EnumFSM)] or something like this in enum declaration.

enums may have some other useful compiler extensions like:

#[derive(CEnum)]
enum Color(i32) {
  red, // == 0
  green, // == 1
  blue => 5
}
...
let a : i32 = Color::red::native;

or

#[derive(EnumBitMask)]
enum Severity(u8) {
  trace, // == 1
  debug, // == 2
  _, // unused, skip 4
  info, // == 8
  _,
  _,
  _,
  _,
  critical // == 256, raise an Error (u8 overflow)
}
// Severity::_ == 0 e.g.

@mark-i-m
Copy link
Member Author

I'm going to go ahead and close this now to reduce clutter...

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

8 participants