-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Events stream #491
Comments
/cc @stevvooe I'm putting this back into the Icebox. We should do this, not sure the time horizon yet (probably not 1.13). WDYT? |
I still think it's a very useful feature to have. Quite some projects use events to (re) configure services and to hook into docker. Not being able to use events related to Swarm mode sounds like quite a limitation. I'm a bit worried that those projects will now implement all kind of workarounds, and by the time Swarm events are implemented, it's too late. |
For my use case, I would like to know when an operator initiates a change to the scale of a particular service (and the change is confirmed within the swarm cluster). The use case is a replicated database, which needs to know how many instances of the database process should be reachable in the swarm so that it can determine how many replicas constitutes a majority. For example, if the service is set to scale=3, then a minimum of 2 replicas would need to agree on a distributed transaction for that tx to be applied. If the operator then changes the scale to 5, then 2 replicas is no longer a sufficient majority - it is now 3. I'd like a documented way to know when such changes occur. |
I'm not sure how I can justify putting an orchestration abstraction into production without proper monitoring. Event streams are critical for operational insight. These events power everything from workflow automation to operator incident response. To punt on a complete event stream implementation for at least two releases feels like serious blow. People are going to fill the gaps (which is difficult given the new hits to integration points) or they're going to use a different orchestration system. Events we need:
These events would augment the existing container/network/volume events we have today and hint at orchestration intent rather than contextless change reporting. |
Also swarm node events. In several current deployment projects, we're trying to come up with reliable ways to handle node activities as well, esp when mixed with AWS ASG's:
|
I would also like to see this and I think it may be important for jobs, i.e. imperative one-shot tasks. |
@alexellis also see moby/moby#23880 for batch/jobs |
Thanks |
One of the most important things for me is to know node health - in particular to know when a node returns after being down. |
Folks - @dongluochen is currently working on this, we are hoping to ship this in the next release |
@aluzzardi Is there a protobuf proposal? |
@stevvooe I'll create a protobuf proposal. |
I don't think there'll be changes in the protos. Events can be implemented in Docker as long as swarmkit exposes a Watch gRPC interface which @aaronlehmann has a prototype for |
Here is current design for Swarm events. Feedbacks are welcomed. Swarm leader subscribes to gRPC flow from Swarmkit which consists of raft memory store changes. @aaronl is working on the Watch interface. The leader compares each raft change with recorded state and emits Swarm events when necessary. Events are cached on leader up to amount limit or time limit (e.g., store events for past 24 hours). On the first iteration leader would not preserve events before it becomes leader, i.e., Swarm event history are not kept on leader switch. It requires raft event replay to re-gerenate history Swarm events, or persistent storage for Swarm events. When client detects connection failure, it should re-estabilish event stream. Event categoryThe first iteration supports the following events. Task events are not supported as they are transient steps of service orchestration. I think it should be reported at service level. How to report service convergence/failure is a topic to explore.
CLIDocker Swarm events follows
The filtering flag
The event stream is for program consumption as well as human inspect. It follows
API
Example response, to be refined.
|
@dongluochen Looks reasonable. Could we please avoid the cruft in the swarm events API? This means things like using proper RFC3339 timestamps and not sending deprecated fields ( |
I'm heavily against using a different API endpoint or CLI command. There are very painful efforts going on to convergence Docker & Swarm APIs together and we should avoid anything that adds yet another API difference. Regarding the events per se, I think we should do something as simple as possible:
I think those fit CRUD+state. Since For The only one truly aside are
I think those should just be CRUD as well. |
Would this show updates to the spec, or any update to the service object?
Sounds reasonable, but I wonder if these will be hard to deal with. Will the events contain the old and new spec? If not, it seems very hard to detect things like a node becoming drained. I think that's the kind of thing people want events for. |
Looking at Docker events API, I think it has tried to describe what happens, not just a simple notification. For example, Docker containers report the following events: Swarm events can reuse current Docker events CLI like |
Slightly orthogonal, but we should have a proper look at the docker events api, because there are some issues in the way event / object attributes are returned (for example, labels and actual properties are combined in a single map; not sure what lead to that implementation, but it's, erm, 💩 ). |
@thaJeztah @dongluochen We should also look to other examples of event API. Typically, it is bad practice to embed event data within the event itself, as you may operate on out of date information by the time you handle it. It would be best to include only the information about the event (who, what, where, when). |
@stevvooe Right. While the practice of operating on data embedded within an event directly is an anti-pattern omitting that data presumes the use-case. Many event stream consumers simply want to log and move on. |
Sure. In that case, you log "service foo with id abc created". If you need more, you lookup the current state of the object. Consumers that don't require the whole state shouldn't pay the cost of including the extra data. Perhaps, carrying state should be an option. |
@dongluochen which swarm version will have this feature , can watch the node change event is very necessary when management many swarm clusters. |
@dongluochen Is there any ETA for this feature? |
Current plan is to release event API with Docker-17.05 or 17.06. It has upstream dependence on store Watch API. |
(Note the store watch API is #2034) |
@dongluochen you said that task events will not be supported for the first iteration, instead this will be reported at the service level. Does that mean that I will be able to know during a rolling update when a task was successfully started? I.e, when I do a rolling update with some delay, could I execute some script - by watching the event stream - right after one task was updated? |
@kaikuchn The event support is merged moby/moby#32421. We don't support task monitoring in this first iteration. Instead you can get events on service
User might still want to get update on each task. We will investigate. There will be a lot of task events overwhelming other events. I'm thinking make it a daemon option so users can enable it if they want to. |
@dongluochen I was hoping to use the events to update a load balancer's configuration of available servers. E.g. if a container is rescheduled for any reason I'd like to get an event for it so I can update the LB with the new IP/domain name for the container. Will that be possible after this update is shipped? |
I would say task-level events are important for any kind of dynamic or serverless platform too. +1 for task-level events. |
The current plan is to add an option to enable task level events. There are several issues related to functionality and scalability.
|
For my use case there's two task states I'm interested in:
A little background on my use case: |
@kaikuchn Thanks for providing your use case. Task level events should meet your requirement. It just carries a lot of unrelated events.
A task status beyond |
Implemented in #2034. |
Any docs on how to use this now? |
@datacarl this is just the changes in SwarmKit; for docker, these events will show up in the events-stream in |
The API should provide a way to watch for changes.
There are various solutions to do that:
Events()
RPC)WatchTasks()
,WatchServices()
, ...)GetService()
thenGetService(IfModified: lastServiceVersion)
/cc @stevvooe @vieux
The text was updated successfully, but these errors were encountered: