-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grammY conversations (Why Scenes Are Bad) #136
Comments
I kind of implemented "nested handlers" approach in https://github.com/IlyaSemenov/grammy-scenes I've extracted it to a npm package from one of my actual bot projects. I am not 100% happy with the API and architecture I ended up with, but it allows to write concise code, while still being quite flexible (it extends the built-in One of the things I'm not sure in is handling the scenes context. I'm not convinced whether scenes should have separate first-class contexts, or if we should simply direct user to use "normal" sessions for their context data. In my initial prototypes I went with the former, which (back then) resulted in very complex Typescript generics, and it also wasn't straightforward when to clear the context and when to keep it. So I ditched all that in favour of "normal" sessions with some simple typings (see example in README) — but I'm not exactly happy about that. First class scene contexts still seem to be a better fit to me, they just need to be carefully implemented. |
Approach C: “Wait and Resume”On first sight, this suggestion may look similar to Approach A. However, it actually behaves very differently and it makes more sense to start explaining it from scratch. Remember from https://grammy.dev/advanced/middleware that middleware is a tree in grammY. This tree will be traversed in depth-first order. We regard a conversation also as a hierarchy of different flows. Always start with question Q0, if the user says YES, continue with Q1, else continue with Q2. We suggest to use the former hierarchy to express the latter. Here is the example from the top again. Handling invalid input is omitted intentionally for brevity. const conversation = new Conversation('age-at-birth')
conversation.command('start', async ctx => {
await ctx.reply('How old are you'))
ctx.conversation.resume()
})
conversation.wait()
conversation.on('message:text', async ctx => {
ctx.session.age = parseInt(ctx.msg.text, 10)
await ctx.reply('Cool, how old is your mother?')
ctx.conversation.resume()
})
conversation.wait()
conversation.on('message:text', async ctx => {
const age = parseInt(ctx.msg.text, 10)
await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
ctx.conversation.leave()
}) This provides a simple linear flow that could be illustrated by
We can interrupt the middleware execution by inserting a The Next, let us see how we can branch out, and have an alternative way of continuing the conversation. const conversation = new Conversation('age-at-birth')
conversation.command('start', async ctx => {
await ctx.reply('How old are you'))
ctx.conversation.resume()
})
conversation.wait()
// start a new sub-conversation
const invalidConversation = conversation.filter(ctx => isNaN(parseInt(ctx.msg?.text ?? '')))
invalidConversation.on('message', ctx => ctx.reply('That is not a number, so I will assume you sent me the name of your pet. What animal is it?'))
invalidConversation.wait()
// TODO: continue conversation about pets here
// Go on with regular conversation about age:
conversation.on('message:text', async ctx => {
ctx.session.age = parseInt(ctx.msg.text, 10)
await ctx.reply('Cool, how old is your mother?')
ctx.conversation.resume()
})
conversation.wait()
conversation.on('message:text', async ctx => {
const age = parseInt(ctx.msg.text, 10)
await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
ctx.conversation.leave()
}) We have now defined a conversation that goes like this:
That way, we can define conversation flows. Note that the conversation will remain between two const conversation = new Conversation('word-of-the-year', { autoResume: true })
conversation.on('message', async ctx => {
await ctx.reply('What is your favourite English word?))
})
conversation.wait()
conversation.on('message', async ctx => {
ctx.session.word = ctx.msg.text
await ctx.reply('Why do you want this word to become the word of the year?'))
})
conversation.wait()
conversation.on('message', async ctx => {
ctx.session.reason = ctx.msg.text
await ctx.reply('Is there anything else you would like to say?'), {
reply_markup: new InlineKeyboard().text('Nope', 'no')
})
})
conversation.wait()
conversation.on('message', async ctx => {
ctx.session.comment = ctx.msg.text
await ctx.reply('Thank you for you submission'))
})
conversation.callbackQuery('no', ctx => ctx.reply('Skipped. Thanks for the submission!')) Personally, I think this is so far the best option we have. I prefer it over the other two approaches. What do you think? |
How Approach C Resembles Imperative CodeIt struck me that we are getting full flexibility with statements, branching, loops, functions, and recursion using Approach C. In other words, we have the flexibility of code when designing how conversations work. This is a little bit revolutionary and maybe not too obvious on first sight. I would like to illustrate how it will work. StatementsA statement in conversations is simply all middleware between two // First statement
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Second statement
conversation.on('message:text', ctx => { /* ... */ })
conversation.on('message:photo', ctx => { /* ... */ })
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Third statement
conversation.on('message', ctx => { /* ... */ }) These individual blocks of handlers act like the individual instructions of a conversation. BranchingThis has already been described in the post above. In short, you can use LoopsIteration is possible by recursion (preferred, see below), or by jumping back to a fixed step. Admittedly, this is rather a jump statement than an actual loop. (If you have a use case for Example of jumping back: conversation.on('message', ctx => { /* ... */ })
conversation.wait('begin-loop') // loop starts here
conversation.on('message', ctx => { /* ... */ }) // first loop statement
conversation.wait()
conversation.on('message', ctx => { /* ... */ }) // second loop statement
conversation.wait()
conversation.filter(condition, ctx => ctx.conversation.resume('begin-loop')) // loop condition
conversation.on('message', ctx => { /* ... */ }) // broke out of loop FunctionsYou can define reusable parts of conversations. This lets you define sub-conversations. They can be included in different places of the main conversation, and nested as deep as you like. This is the same concept as functions in code: reusable named pieces of control flow. In order to illustrate this, lets define a small part of the conversation that verifies the identity of the user by asking for their birthday. const captcha = new Conversation('captcha', { autoResume: true })
captcha.use(async ctx => {
await ctx.reply('Please enter your birthday to proceed. Day of month?', {
reply_markup: numberKeyboard(1, 31) // builds a keyboard 31 buttons with numbers, omitted here
})
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
ctx.session.captcha.day = ctx.callbackQuery.data
await ctx.reply('Month?', { reply_markup: numberKeyboard(1, 12) })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
ctx.session.captcha.month = ctx.callbackQuery.data
await ctx.reply('Year?', { reply_markup: numberKeyboard(1900, new Date().getFullYear()) })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
ctx.session.captcha.year = ctx.callbackQuery.data
await verifyBirthday(ctx, ctx.captcha) // checks birthday against a database, calls `ctx.conversation.leave` if incorrect
}) We now have defined a “function” for checking the birthday. We can “call” the function by passing it as conversational middleware to another conversation. const mainConversation = new Conversation('rocket-launcher')
mainConversation.command('launch_rocket', captcha, async ctx => {
await ctx.reply('You are verified, launching rocket 🚀')
})
mainConversation.command('purge', captcha, async ctx => {
await ctx.reply('You are verified, purging all data')
}) Note how we can use the same captcha in different places. It behaves just like a middleware tree does, and it is closely integrated with grammY's middleware tree. RecursionIt should be possible to use the middleware recursively. This implies some sort of call stack, but it isn't quite clear yet if we actually need to store it or not. In order to use recursion, we could for example restart the captcha with another attempt if the verification fails. In the last step, we could do: captcha.on('callback_query:data', async (ctx, next) => {
ctx.session.captcha.year = ctx.callbackQuery.data
await next()
})
// `verify` checks the birthday in `ctx.session.captcha` against a database, and returns `true` if it is correct
captcha.filter(verifyBirthday, ctx => ctx.conversation.resume()
// else: restart captcha
captcha.use(async (ctx, next) => {
await ctx.reply('Wrong birthday, try again!')
await next()
})
captcha.use(captcha) This will re-enter the captcha until it is correct. In reality, you probably also want to allow the user to exit the captcha manually, this was omitted here for brevity. |
The only way I see is that one always uses a single top-level conversation with
mainConversation.command('launch_rocket', captcha, async ctx => {
await ctx.reply('You are verified, launching rocket 🚀')
}) surviving server restart? What if a user takes a 10 minutes break before selecting a month in the captcha sub-conversation, and CI/CD restarts the server during that time? The
// First statement
conversation.on('message', ctx => { /* ... */ })
conversation.wait() // <--- what do we "wait" for here? a message (above), or text/photo (below)?
// Second statement
conversation.on('message:text', ctx => { /* ... */ })
conversation.on('message:photo', ctx => { /* ... */ })
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Third statement
conversation.on('message', ctx => { /* ... */ })
// <--- why don't we have wait here? I'd rather suggest something like: conversation.step(step => {
step.on('message', ctx => { /* ... */ })
})
conversation.step(step => {
step.on('message:text', ctx => { /* ... */ })
step.on('message:photo', ctx => { /* ... */ })
step.on('message', ctx => { /* ... */ })
})
conversation.step(step => {
step.on('message', ctx => { /* ... */ })
}) This is immediately obvious what are the conversation steps and how the handlers are grouped. There is also no discrepancy in the number of wait calls. (Essentially, that is what I did in grammy-scenes). |
Thanks for the feedback! :) Those are some good questions, here are the explanations: 1We want to be able to start conversations very flexibly from anywhere in any handler. This means that we cannot simply pass a conversation as middleware to start it. As an example, this does not work: const conversation = new Conversation('id')
// ...
// This is NOT a good idea:
bot.filter(ctx => ctx.chat?.type === 'private').command('start', conversation) The conversation must be able to receive all future updates in order to continue where it left off. In the above example, it would only be able to handle const conversation = new Conversation('id')
// ...
// Install it:
const bot = new Bot('token')
bot.use(session({ initial() { return {} } }))
bot.use(conversation)
// Enter a conversation from anywhere:
bot.filter(ctx => ctx.chat?.type === 'private').command('start', ctx => ctx.conversation.enter('id')) Once you entered a conversation, the flow will follow the natural flow of the middleware, and sub-conversations are entered by reaching them in the middleware tree execution order. 2The conversations plugin requires users to install the sessions plugin. As a result, we have access to persistent storage. We can use this to store the current state of the conversation across server restarts. The plugin will be able to handle arbitrary restarts (potentially after every single incoming update). As long as you do not experience any disk corruption or other data loss, the user will not notice any intermediate downtime between the messages. 3Maybe my formatting was a little off in the example above. Here are some definitions, I hope they help to shape a better intuition. Statement. Also called step. A part of the middleware tree between two
That's why we also don't need a closing What are we waiting for? That cannot be answered in a straightforward manner. We obviously don't literally wait, because JS does not allow us to block the execution or something. It's not a sleep statement. The A better way to phrase this question would be to ask what happens when an update reaches the |
Regarding your suggestion, that's actually the old scenes approach we know, something that exists in many shapes and forms. You are basically defining a state automaton. Branching and reusability are hard. State automata inevitably lead to spaghetti code. How to conditionally skip a step? How to repeat several steps? How would something like using the captcha look? The answers (if possible at all) include wild jumping from one place in the code to the next, almost always using a number as index, or some kind of string identifiers. It's what I labelled conversational GOTO code in the original post of this issue. For example, in your plugin I understand that many people are looking for exactly scenes, because that's all we've known in the past years. It is so to say the best abstraction we have had until now, and other libs like Telegraf have done it already. However, I am fairly confident that once we give people the power of defining readable conversations that they can structure the same way they structure their code, it will become more obvious why I consider scenes harmful. |
That's not really what I was asking. I was referring to the particular piece of code: mainConversation.command('launch_rocket', captcha, async ctx => {
await ctx.reply('You are verified, launching rocket 🚀')
}) Let me alter it a little bit: mainConversation.command('launch_rocket', captcha1, captcha2, async (ctx) => {
await ctx.reply('You are verified, launching rocket 🚀')
}) In order to persist the state in the workflow one must uniquely identify the workflow position. In the specific code above, one must store not only the position inside a (internally named) captcha conversation, they must also somehow record that the conversation is a part of the outer workflow (unnamed list of It slipped me at first that it's not By the way, what if a user setups two command handlers in a conversation? I suppose you'll need to "stack" them internally? mainConversation.command('launch_rocket', captcha1, captcha2, async (ctx, next) => {
if (random() < 0.5) {
await ctx.reply('You are lucky, proceeding...')
await next()
} else {
await ctx.reply('Bad luck today!')
}
})
mainConversation.command('launch_rocket', captcha3, async ctx => {
await ctx.reply('You are verified, launching rocket 🚀')
})
So that means the proposed library API leads to unreadable/indigestible code, which is not good. I wasn't referring to myself when I was asking that question. (I personally understand the whole picture with the middleware trees. I published a few open source libraries of my own, including now 3 grammy plugins, and contributed to much more in the past years.) I was trying to illustrate how does this code look in a stranger's eyes. Everyone's going to ask "what do we wait for here?" and there must be a clear answer. But I think we actually can answer that question: the conversation waits for the "proceed" instruction coming from the previously registered handlers. |
Using the normal main_scene.scene("step1", (scene) => {
scene.on("message:text", async (ctx) => {
if (ctx.message.text === "secret") {
await ctx.reply(`Greetings, mylord`)
ctx.scenes.move("step3")
} else {
await ctx.reply(`Hello`)
ctx.scenes.move("step2")
}
})
})
You just don't move off the current step: const main_scene = new Scene()
main_scene.scene<
MainBotContext &
SessionFlavor<{
main_step1?: {
names: string[]
}
}>
>("step1", (scene) => {
scene.enter(async (ctx) => {
await ctx.reply(`Send me 3 names.`)
})
scene.on("message:text", async (ctx) => {
if (!ctx.session.main_step1) {
ctx.session.main_step1 = { names: [] }
}
ctx.session.main_step1.names.push(ctx.message.text)
if (ctx.session.main_step1.names.length >= 3) {
await ctx.scenes.move("step2")
}
})
})
const captcha_scene = new Scene<
Context &
SessionFlavor<{
captcha?: {
answer: string
next_scene: string
}
}>
>()
captcha_scene.enter(async (ctx, next_scene) => {
const { answer, image } = await generateCaptcha()
ctx.session.captcha = { answer, next_scene }
await ctx.replyWithPhoto(image)
await ctx.reply(`Enter letter you see on image above`)
})
captcha_scene.on("message:text", async (ctx) => {
if (ctx.message.text === ctx.session.captcha?.answer) {
await ctx.scenes.move(ctx.session.captcha.next_scene)
} else {
await ctx.reply(`Please try again.`)
}
})
const rocket_scene = new Scene()
rocket_scene.enter((ctx) => ctx.scenes.inner("captcha", "launch"))
rocket_scene.scene("captcha", captcha_scene)
rocket_scene.scene("launch", (scene) => {
scene.enter(async (ctx) => {
await ctx.reply(`Launching rocket!`)
})
}) This indeed requires passing the ID of the next (sub)step, but is that really a big deal? The code is still concise and reusable.
Most of the time you read the next (sub)scene block. Since that is a 95% use case, this could indeed be improved, with |
That's a smart way to think about it, I had not even considered that yet. Regarding the proceed instructions as events that “uncork” the wait calls is a cool analogy, it'll make it prominently into the docs once they're written. Good perspective! @all-contributors add @IlyaSemenov for idea
Yes, that's correct. (Sorry, for misunderstanding the question at first, I didn't really get you were asking about the internal implementation.) As we are giving the users the full capabilities of imperative code, we will need to store a stack of function calls each with its respective “return address” as well as a program counter per stack frame. This is more complex to implement than having a map object from string identifiers to steps (like scenes do), but I am happy to deal with complexity if that allows for readable code in user land. The single remaining challenge I see with the implementation is how to handle
In my opinion, yes! It's a huge deal. Let's see how the three things would look with grammY conversations: const conv = new Conversation('illustration')
// We can use block statements just like in if-statements:
const isLord = bot.hears('secret'); {
isLord.use(async ctx => {
await ctx.reply(`Greetings, mylord`)
ctx.conversation.resume()
})
}
conv.use(async ctx => {
await ctx.reply(`Hello`)
ctx.conversation.resume()
})
conv.wait()
// ...
conv.on('message:text', ctx => {
ctx.session.main_step1 ??= { names: [] }
ctx.session.main_step1.names.push(ctx.message.text)
if (ctx.session.main_step1.names.length >= 3) {
ctx.conversation.resume()
}
})
Illustrated above. If I may elaborate on the analogy between assembly and scenes: The concept of Note that the above code examples still have a lot of potential for being made more readable. So far, this is kind of the “raw” version that directly operates on the conversations abstraction. It helps if we introduce a little bit of syntactic sugar. Let's add We cannot do const conv = new Conversation('illustration', { autoResume: true })
// Using the normal `if` statement:
const text = conv.on('message:text'); {
const isLord = text.hears('secret'); {
isLord.do(ctx => ctx.reply(`Greetings, mylord`))
}
const notLord = text.else(); {
notLord.do(ctx => ctx.reply(`Hello`))
notLord.wait()
// do more step2 things here using `notLord`
}
text.wait()
// do step3 things here
}
conv.wait()
// ... No more jumps! This is much, much closer to the actual control flow that you meant by the example you wrote. It makes perfectly clear where the code waits, and where it does not. Compare it to the pseudo-version of the example:
This is almost surprisingly identical with how the conversations code looks. Admittedly, there a still a few details to figure out around grammY conversations. I think it makes sense to iterate further on this idea once we have a working implementation. For example, we may consider to set However, those are merely coding challenges. They have been solved before, and we can solve them again. There will be beta releases, we will have to collect feedback, odd edge cases are going to be found, and we will have to reiterate. Either way, the really hard stuff is how to design the API surface, and how conversations can be used, i.e. how the bot code will look. This seems to be pretty clear now. Thank you again for the questions :) |
I could not determine your intention. Basic usage: @all-contributors please add @Someone for code, doc and infra For other usages see the documentation |
@all-contributors add @IlyaSemenov for ideas |
I've put up a pull request to add @IlyaSemenov! 🎉 |
Looking at both codes of you two, the one @IlyaSemenov postet looks really out of place. It's for me as a noob, much harder to understand, because it's departure from normal code written in grammY. I in the mean time fully understand @KnorpelSenf flow. Though the Else and hear are fishy. They are not replicates of bot.hears. and else isn't a thing previously. That would at least need a word in documentation. I also don't really like, that the Else exists at all.
noMatch includes the success return of previous call text.hears. i suspect .else would be the same, just that it calls the following code. I hope this makes sense. Feel free to explain .else() 😁 |
Else works this way: // Triggers on updates from private chats
bot.filter(ctx => ctx.chat?.type === "private", ctx => { /* ... */ })
// Triggers on updates not from private chats
bot.else(ctx => { /* ... */ })
// Triggers on forwards
bot.on(":forward_date", ctx => { /* ... */ }))
// Triggers on everything that's not a forward
bot.else(ctx => { /* ... */ })
// Same as else
bot.drop(matchFilter(":forward_date"), ctx => { /* ... */ })
// Triggers on "hi"
bot.hears("hi", ctx => { /* ... */ }))
// Triggers on all updates that don't match the hears trigger
bot.else(ctx => { /* ... */ }) |
I have controversial feelings about this one. On the one hand, this introduces more "state" to the API, and typically the more stateless an app is, the more reliable and scalable it is. On the other hand, in this particular case everything is already heavily coupled, so it's not making it really worse in that perspective, yet it indeed brings certain useful "sugar" to the user land code. So I guess considering this proposed pseudo If it were me, I'd also try to come up with something like (rephrasing your functional example): const [isLord, notLord] = text.branch(text.hears('secret')) // or text.branch(text => text.hears('secret'))
isLord.do(...)
notLord.do(...) |
Returning two composers from I don't think we should confuse the composer instances returned by |
Consider this: bot.filter(ctx => random() < 0.1, (ctx, next) => next())
bot.else(ctx => ctx.reply(`Bad luck!`))
bot.use(ctx => ctx.reply(`Lucky!`)) or even: bot.filter(ctx => random() < 0.1, (ctx, next) => next())
bot.use(async (ctx, next) => { await ctx.reply(`Wait for it...`); await next(); } )
// can we run `else` not directly after `filter`? or do we disallow this during config time?
bot.else(ctx => ctx.reply(`Bad luck!`))
bot.use(ctx => ctx.reply(`Lucky!`)) Somehow It's not a big deal though (these implementation details are not visible in user land anyway), so I'd rather for it, it could be handy indeed for defining simple 'flows'. |
I don't think I made clear enough what I meant. Let me give you a working solution |
I tested both code snippets and they work flawlessly with the linked PR. First snippet:
Second snippet:
Also, regarding your question
My PR does not interdict this, so the above code snippet actually works. But if we deem it counterintuitive that one can put statements between the if and the else part, and we need to protect people from having such code, then we can maybe add a line that will prevent people from doing that. |
For completeness: there were four other lines in my file. import { Bot } from "https://raw.githubusercontent.com/grammyjs/grammY/fe8f4ff16d78790c25d6cd62e6a929f93a43a781/src/mod.ts";
const random = () => Math.random()
const bot = new Bot("my-bot-token-was-here");
// each snippet came here
bot.start(); |
I see, this is quite elegant indeed. |
Well fair enough, we do have to add additional state to the composer (at least we add no extra state to the session). I am still hesitant towards merging #152 because bot.filter().else() does not work, only bot.filter()
bot.else() does. It makes a lot of sense to us but it could be really confusing to newcomers. In contrast, it also does not work to do Either way, next up for me is to start implementing the actual conversations plugin. I have a few more ideas about some tweaks that should make the internals quite elegant, let's see if those work out the way I hope. We can wait for other people to provide more feedback on the |
Obviously, this should not rely on in-memory state. I kind of added a duct tape solution in my (admittedly) duct tape scenes library, but that's using an ugly monkey patch generating a pseudo Telegram update... definitely not a proper library grade solution. |
You are right, this is a common use case. I will need it myself. Conversations are going to be storing their state in the session. Session data is loaded depending on the context object. The context object sources its information from a Telegram update. Hence, we can only operate the conversations when have an update. We could break this chain at any point:
None of the options are too pretty. If I understood you correctly, you went with option 1. That should work here, too, but I agree that it's not the best thing to do. Option 2 is really hard because the context object is tightly integrated with the update. Option 4 is even harder because one would have to know and replicate the internal data representation of the conversations. This leaves us with option 3. However, it requires that the user manually loads the session data, and somehow passes it to the conversations plugin (without middleware, because middleware can only be run if we have a context object, which we don't). Again, that's not pretty. I therefore suggest to avoid a direct integration of conversations with external events. Instead, we could add another composer method
The authorisation flow would then need to conclude with a deep linking redirect. This will open the bot, and send a In summary, the usage would be like this:
Analogously, we could add a Alternatively, the bot could proactively send a message to the user when the authorisation is complete. This would happen outside of the conversation. Implementation-wise, these proposed control structures are trivial to add. I have a working solution locally, it is more elegant than the |
That is correct, I went with option 1 — but that is because I didn't see a direct way for option 2 with monkey patches only. However, I think grammy itself could implement option 2 natively, but not "faking" a context, rather using a separate context class for handlers of these externally generated updates. This could be well typed and not really be a hack or fake. What I suggest is something like (just a sketch): export declare class Composer<C extends Context> implements MiddlewareObj<C> {
// ...
onExternal(...middleware: Array<Middleware<ExternalUpdateContext<C>>>): void; // void for proof of concept
}
/*
Just a sketch to show the intention.
Probably it'll be needed to extract a BaseContext interface and extend it here and in Context,
and use it Middleware<> generics.
*/
interface ExternalUpdateContext<C extends Context> extends Omit<C, "update"> {
payload?: any
} and then: conv.onExternal(async ctx => {
// here we are well aware that ctx is not a Telegram-update context,
// which, however, properly implements `ctx.chat` and `ctx.reply`
await ctx.reply(`External event: ${ctx.payload}`)
})
httpServer.post("/callback", async ctx => {
await bot.handleExternalEvent(ctx.query.chat_id, ctx.request.data)
}) This will actually allow streamlined user experience without having them to poll their status, or click |
Ah no, that won't work well. If it's going to bypass the other middlewares (non- If we do pass it to other middlewares (as we want session etc. to work - that's what I do), they could be expecting the So yes it goes to faking an
After all, most of the time I understand that you're probably not going to approve either of these for generic use. It's a tough balance between feature richness and simplicity. Ideally, the types should be extensible enough to make this possible with an external plugin. |
Messing with the update is not going to end well, at least not on the type level. The library is built with the reasonable assumption that Telegram updates are following the specs. Every single field that we might remove or make optional will inevitably remove some guarantees for some user. Note that adding a union member to the context type in middleware is a breaking change for every handler ever written with grammY. Also, you couldn't extend the context class anymore because you cannot extend union types. This also doesn't make a lot of sense semantically. We write middleware to handle updates. Faking requests to call these handlers from other places is a big hack. If we want to send a message to a chat upon other events, why don't we just do it? You can use transformative context flavours to mess around with the context object as much as TypeScript allows. I don't think it's smart, but you're right that it's flexible enough to support this. |
We surely can, but we also want to move the conversation state forward in a natural way. Forcing users to click START once again, or having them click some inline button to proceed is a big hack as well (this time from the UX standpoint). Thinking more, "moving the conversation" essentially means jumping over the current Note that in this case, it should be possible to distinguish whether the conversation is still "the same". For instance, imagine there are two commands I've implemented |
Fair enough, that's true … Maybe one of the other solutions is better then.
I think that's already possible with what I suggested. Let's pick this point up again after I published some code. Regarding the point about nested conversations, in my current code draft, I disallowed having several conversations in parallel. It's leads to non-obvious behaviour in user land code. Also, I don't think this would be a good user experience. The described function calls are already possible, and they allow nesting. |
I wasn't describing nested conversations. In the scenario above, user started a first conversation with The first conversation started a non-bot job that finished after the new conversation has been started. The proposed global Does that make sense now? |
I think so, yes. Thanks for the elaborations. I'll drop a message here as soon as the plugin code is ready. I think having something concrete to talk about will make it easier to properly address this use case. Then we can look into how exactly this should look. |
@IlyaSemenov you can checkout the branch type MyContext = Context & ConversationFlavor;
const c = new Conversation<MyContext>("id");
c.do((ctx) => ctx.reply("Hi there, send me a message"));
c.wait();
c.do((ctx) => ctx.reply("Cool"));
c.do((ctx) => ctx.reply("I want another message"));
c.wait();
c.do((ctx) => ctx.reply("Awesome, now I'm done listening to you!"));
const bot = new Bot<MyContext>("");
bot.use(session({ initial: () => ({}) }));
bot.use(c);
bot.command(["start", "help"], (ctx) => ctx.reply("Send /enter"));
bot.command("enter", (ctx) => ctx.conversation.enter("id"));
bot.start(); This will enter the conversation, keeping track of the middleware execution order. When the wait statement is reached, the conversation will stop running, and the call stack is stored in the session. Upon the next message, the stack is read and replayed, hence continuing the conversation from where it left off. I must warn you! This code is very much not working. This simple example runs, but the actual cool thing of nested statements etc still has at least one bug so it does not even work. That means no branching, no loops, etc. Also, everything is still full of debug statements and the code needs to be refactored more than twice to become more readable, not to mention better commented. Now that I have expressed how uncomfortable I am with how early I upload this, it may still give you an idea how this will be implemented and what to expect. Please share whatever feedback you have, I know it's almost not working. |
Please don't apologise - I understand what early prototypes are :) So with the example above, the conversation intercepts everything:
Normally, I would expect Moving So I suppose you will want to separate the conversations control middleware and the current conversation. |
Also, I'm yet to see how the actual handlers will work along with wait's, user input validation and nested conversations. Right now, if I insert c.do((ctx) => ctx.reply("Hi there, send me a message"))
c.wait()
c.do((ctx) => ctx.reply("Cool"))
c.on("message:text", (ctx) => ctx.reply(`You sent: ${ctx.message.text}`))
c.do((ctx) => ctx.reply("I want another message"))
c.wait()
c.do((ctx) => ctx.reply("Awesome, now I'm done listening to you!")) result:
|
I see that point. You are right, this needs some redesign … I'll have to think about that, I'd like to avoid that people must register multiple things, but I'm not sure it's possible.
That's true, we need to encourage people to always use c.on("message:text").do((ctx) => ctx.reply(`You sent: ${ctx.message.text}`)) I have come to accept this, even though I wasn't too happy about it at the start. Any other solutions I have investigated must somehow set more symbol properties on the context object to track if Right now, it's at least consistent: if you don't call That being said, I'm still open for suggestions. |
Another thing I'm pretty unhappy with is the narrow context types after filtering. const texts = conversation.on("message:text")
{
texts.do(ctx => ctx.reply("Echo: " + ctx.msg.text))
texts.wait()
texts.do(ctx => ctx.reply("Echo: " + ctx.msg.text)) // compiles, but may crash
}
Currently, the way to mitigate this is to continue on the composer returned by const texts = conversation.on("message:text")
{
texts.do(ctx => ctx.reply("Echo: " + ctx.msg.text))
const waitedAfterText = texts.wait()
waitedAfterText.do(ctx => ctx.reply("Echo: " + ctx.msg.text)) // compiles, but may crash
} This obviously is not ideal, because it still permits crashing code to compile. It makes sense that this is the case, because when resuming the execution, we would have to skip the checks installed by I still love the possibilities of being able to express conversational flows as code, and I am convinced that this is a very promising solution. I hope you agree that it's certainly is a better approach than scenes. However, the longer we work on this, the harder are the problems that need to be solved, and that's not a good sign. I'll first try to solve some of the above problems with the current approach, but at the same time I'd like to explore alternative solutions to this implementation. The three main ideas I have are still very abstract, but they start like this:
|
@KnorpelSenf Please take a look at the new version of https://github.com/IlyaSemenov/grammy-scenes (README includes reasonably feature-full examples). The primary change is that a scene (conversation) is now a middleware like in your examples (without separate I've architected things in the way that The implementation supports unconditional and conditional nested scenes, including recursion. I also added support for context-local session data (kind of "local variables" for the scenes, as opposed to "global" session data). UPDATE: So far I've been using grammy-scenes with 2 bots, ~15 scenes, and ~100 "steps", and here are my observations. |
@IlyaSemenov sorry for the long delay. I did not have much time for this in past few months, and also I started having more and more doubts about this. I have concluded that my idea above is bad. You are right. We have already talked about some of the disadvantages here, and I thought of several more, which I will skip now. Thanks for your work on scenes, I appreciate it! This is great, it is probably the best implementation of scenes that I've seen so far. I found your observations especially valuable. It's cool to see that composition and nesting based around a concept of waiting and resuming is well-suited. However, I did not change my mind about scenes in general. I still can't read the code in a linear fashion (because every step has a different callback), and I have to know a lot about the plugin to make sense of what all the GOTO jumping does. This still interrupts the reading flow all the time. This is why I created a new package with a completely different strategy than discussed at any point before. It is very much not usable yet, but there is an example bot in the README which outlines the concept. It can do the wait/resume thing, supports nesting, loops, functions, branching, recursion, error-handling with try-catch-finally, composition across modules, and so on. (We basically get all of this for free, it does not need to be supported explicitly.) It is trivial to implement the system you talked about which lets you resume based on external events (but I wanted to publish before adding that, so it's not supported yet). It has type safe local sessions, even across functions, and the code is more concise than any of the ideas discussed above. Also, it is very easy to understand the code even if you have never seen the package before. Naturally, there are a few pitfalls. The main thing is that side-effects outside of grammY must be wrapped in function calls. Basically, you can communicate with the user and do everything with the Bot API, but external things which perform side-effects or are non-deterministic must be wrapped in another function call. Some people may know this pattern from React, there is is called I published the (unfinished) code at https://github.com/grammyjs/conversations. Thank you for brainstorming here! I will close this issue now. There will be more things to talk about, but they are going to be specific to conversations, so they can be discussed in new issues in the new repository. I hope you give this thing a try, I'd be looking forward to your feedback and criticism! |
Admittedly, that title was clickbait. It seems like you are interested in scenes or wizards, and other programming patters that allow you to define conversational interfaces like they were a finite-state machine (FSM, wiki).
What Is This Issue
One of the features that get requested most often are scenes. This issue shall:
1. What are scenes?
A chat is a conversational interface. This means that the chat between the user and the bot evolves over time. Old messages stay relevant when processing current ones, as they provide the context of the conversation that determines how to interpret messages.
Note how the user sends two messages, and both are numbers. We only know that those two numbers mean two different things because we can follow the flow of the conversation. The two age numbers are following up two different questions. Hence, in order to provide a natural conversational flow, we must store the history of the chat, and take it into account when interpreting messages.
In fact, we often don't need to know the entire chat history. The few most recent messages are often enough to remember, as we likely don't have to care about what the user sent back in 2018. It is therefore common to construct state, i.e. a small bit of data that stores where in the conversation where are. In our example, we would only need to store if the last question was about the age of the user, or about the age of their mother.
Scenes are a way to express this conversational style by allowing you to define a finite-state machine. Please google what this is, it is essential for the following discussion. The state is usually stored in the session data. They achieve this by isolating a part of the middleware into a block that can be entered and left.
Different bot frameworks have different syntax for this, but it typically works roughly like this (explanatory code, do not try to run):
This could result in the following conversation.
In a way, every scene defines one step of the conversation. As you can define arbitrarily many of these scenes, you can define a conversational interface by creating a new instance of
Scene
for every step, and hence define the message handling for it.Scenes are a good idea. The are a huge step forward from only defining dozens of handlers on the same middleware tree. Bots that do not use scenes (or a similar form of state management) are effectively forgetting everything that happened in the chat immediately after they're done handling a message. (If they seem like they remember their context, then this is more or less a workaround which relies on a message that you reply to, inline menus, or other information in order to avoid state management.)
2. Cool! So what is the problem?
Scenes effectively reduce the flow of a conversation to being in a state, and then transitioning into another state (
ctx.scene.enter('goto')
). This can be illustrated by translating scenes into routers:Instead of creating new
Scene
objects, we simply create new routes, and obtain the same behaviour with minimally more code.This may work if you have two states. It may also work for three. However, the more often you instantiate
Scene
, the more states you add to your global pool of states, between which you're jumping around arbitrarily. This quickly becomes messy. It takes you back to the old days of defining a huge file of code without indentation, and then using GOTO to move around. This, too, works at a small scale, but considering GOTO harmful led to a paradigm shift that substantially advanced programming as a discipline.In Telegraf, there are some ways to mitigate the problem. For example, once could add a way to group some scenes together into a namespace. As an example, Telegraf calls the
Scene
from above aStage
, and uses the word scene to group together several stages. It also allows you to force certain stages into a linear history, and calls this a wizard, in analogy to the multi-step UI forms.With grammY, we try to rethink the state of the art, and to come up with original solutions to long standing problems. Admitting that
Update
objects are actually pretty complex objects led us to giving powerful tools to bot developers: filter queries and the middleware tree were born, and they are widely used in almost all bots. Admitting that sending requests is more than just a plain HTTP call (at least when you're working with Telegram) led us to developing API transformer functions: a core primitive that drastically changes how we think about plugins and what they can do. Admitting that long polling at scale is quite hard led us to grammY runner: the fastest long polling implementation that exists, outperforming all other JS frameworks by far.Regarding conversational interfaces, the best we could come up with so far is GOTO. That was an okay first step a few years ago. Now, it is time to admit that this is harmful, and that we can do better.
3. So what have we done about this so far?
Not too much. Which is why this issue exists. So far, we've been recommending people to combine routers and sessions, rather than using scenes, as it does not use much more code, and providing the same plain old scenes for grammY is not ambitious enough.
There is a branch in this repository that contains some experiments with the future syntax that could be used, however, the feedback for it was mixed. It does bring some improvements to the situation as it provides a structure between the different steps in the conversation. Unfortunately, the resulting code is not too readable, and it makes things that belong together end up in different places of the code. It is always cool if the things that are semantically linked can be written close to each other.
As a consequence of this lack of progress, we need to have a proper discussion with everyone in the community in order to develop a more mature approach. The next section will suggest two ideas, one of them is the aforementioned one. Your feedback and ideas will impact the next step in developing conversational interfaces. Please speak up.
4. Some suggestions
Approach A: “Conversation Nodes”
This suggestion is the one the we've mentioned above. Its main contribution is to introduce a more implicit way of defining scenes. Instead of creating a new instance of a class for every step, you can just call
conversation.wait()
. This will internally create the class for you. As a result, you can have a more natural way of expressing the conversation. Thewait
calls make it clear where a message from the user is expected.Here is the example from the top again. Handling invalid input is omitted intentionally for brevity.
This provides a simple linear flow that could be illustrated by
We can jump back and forth using
ctx.conversation.forward(3)
orctx.conversation.backward(5)
.The
wait
calls optionally take string identifiers if you want to jump to a specific point, rather than giving a relative number of steps.Next, let us see how we can branch out, and have an alternative way of continuing the conversation.
We have now defined a conversation that goes like this:
That way, we can define conversation flows.
There are a number of improvements that could be done to this. If you have any concrete suggestions, please leave them below.
Approach B: “Nested Handlers”
Newcomers commonly try out something like this.
grammY has a protection against this because it would lead to a memory leak, and eventually OOM the server. Every received
/start
command would add a handler that is installed globally and persistently. All but the first are unreachable code, given thatnext
isn't called inside the nested handler.It would be worth investigating if we can write a different middleware system that allows this.
This would probably lead to deeply nested callback functions, i.e. bring us back to callback hell, something that could be called the GOTO statement of asynchronous programming.
What could we do to mitigate this?
Either way, this concept is still tempting. It is very intuitive to use. It obviously cannot be implemented with exactly the above syntax (because we are unable to reconstruct the current listeners on the next update, and we obviously cannot store the listeners in a database), but could try to figure out if small adjustments could make this possible. Internally, we would still have to convert this into something like an FSM, but maybe one that is generated on the fly. The dynamic ranges of the menu plugin could be used as inspiration here.
5. We need your feedback
Do you have a third idea? Can we combine the approaches A and B? How would you change them? Do you think the examples are completely missing the point? Any constructive feedback is welcome, and so are questions and concerns.
It would be amazing if we could find the right abstraction for this. It exists somewhere out there, we just have to find it.
Thank you!
The text was updated successfully, but these errors were encountered: