-
-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for session checkpoints #1266
Comments
I'm a bit confused... |
Very late response, but this is how I understood the idea @Andre601: If you have a big bot, reconnecting sessions takes a looong time (even if you have x16 login it still takes a while) and uses precious logins (you have limited logins, if you use up all your logins... well, your token is reset, and that's bad). Here's an example when you need to update your bot: Without session checkpoints:
With session checkpoints:
This is a great idea for big bots since you could resume sessions without logging in all the shards again. However I may be completely wrong in my interpretation of the feature, so sorry if I made a mistake! |
I don't think that is the case as there seems to only be one login at the start and the shards just connect to the websocket. |
@Andre601 every shard has a different websocket connection. If you have 512 shards, you will need to login to the WebSocket 512 times and send a IDENTIFY each time (yes, each of them needs to IDENTIFY before they can receive events, I may be wrong but I'm 99% sure that's how it works). This uses valuable logins. According to the example in the original issue, this would be useful if you need to do some downtime on the bot (updates and stuff like that) by saving the bot state into a file, then, on reboot, you load the checkpoint file, allowing you to resume the session without reidentifying. This is very useful if you have a bot that uses a lot of shards but doesn't has x16 login support yet (with causes shards to take up to 30 minutes just to relog!) In theory you are able to resume the session without any issues by storing the session ID + sequence ID + all loaded guilds to a file and then reloading them when starting the bot again. (Of course, you can resume a session with only the Session ID + Sequence ID, and that's very easy to do with a bit of Reflection magic, but of course, JDA will not trigger any events because the guilds are missing) |
From my experiences with my own bot and by checking the logs does JDA send such messages:
As you can see is the "Login successful" only send once and not for each shard separately so we can safely assume that an actual login only happens once, I think we should differentiate between "resuming" and "reconnecting" a session/shard. A resume does to my knowledge not take another login as the connection was just (intentionally) lost temporarily, while on reconnecting it essentially was closed and a new connection needs to be established. My point was mostly about resuming connections here, which don't really take any additional logins while the topic (now that I looked closer at the PR itself) seems more about a complete bot restart/shutdown which would cause a complete reconnect. But my tl;dr here is that from what I gathered and saw in the logs does the Bot only log in once using the identify payload and the number of shards it should have, and afterwards just start the shards one by one. To close this off do I believe that this should be moved to the Discord server as I don't want to continue in flooding this PR with (possibly) unrelated stuff. |
I made a super stupid, bad, and hacky implementation of this, implementing my own idea that I had two years ago. It works by persisting all guilds to a file when JDA shuts down, the stored format is the same used by the Sadly it requires a JDA fork since I needed to make some internal changes to support it, but it does work, and maybe in the future I will clean it up and submit a PR. :3 Anyhow, here's my implementation of it! https://github.com/LorittaBot/DeviousJDA/blob/master/src/examples/java/SessionCheckpointAndGatewayResumeExample.kt#L32 If it was properly implemented, ofc I would not rely on that super crazy hacky hack. I think the best way of handling it would be by creating a |
General Troubleshooting
Feature Request
The cache is currently thrown away the moment the program restarts. We could provide a way to dump a checkpoint file of the current cache to allow resuming the session after restarting. This could be extremely useful to bigger bots that would otherwise exhaust their session start rate limit too quickly.
Exactly how this would be implemented is up for debate and might have to wait for the new auto-sharding to be implemented first.
Example Use-Case
The text was updated successfully, but these errors were encountered: