Proposal: Docker Swarm backed by SwarmKit #2774

nishanttotla · 2017-07-20T23:56:00Z

Proposal

Docker Swarm (aka Swarm Classic) and SwarmKit are tools for container orchestration. Swarm requires the use of an external discovery mechanism for its functioning. Commonly used discovery backends include etcd, consul, and Docker Hub discovery tokens (not recommended in production). SwarmKit on the other hand has discovery built in, and does not require the setting up of a separate discovery backend.

Some projects might want to use both Swarm and SwarmKit, which means that in addition to having SwarmKit’s inbuilt discovery, it is still required to set up an external discovery mechanism (etcd), which adds complexity and points of failure. Moreover, Swarm and SwarmKit might report node states differently, leading to inconsistent views of the cluster.

This issue describes a proposal to make SwarmKit the default discovery backend for Swarm

High-level UX

Swarm managers are Go processes, and they read cluster membership information off a discovery backend. Starting a Swarm manager currently looks like

./swarm manage -H 192.168.99.103:2376 --discovery-opt kv.path=docker/nodes consul://192.168.99.103:8500

where consul://192.168.99.103:8500 specifies the IP address and port for the discovery backend. Swarm agents (which are backed by a Docker Engine each) can then be run to advertise the corresponding Engine’s address to the discovery backend.

docker run -d swarm join --advertise=172.30.0.69:2375 consul://192.168.99.103:8500

This may not be required if the Engine itself advertises to the backend. The key is that the discovery should be able to access the list of agent Engines.

Since SwarmKit (and its inbuilt discovery) is integrated into the Docker Engine, we require that each Swarm manager be backed by a corresponding Docker Engine, that is also a SwarmKit manager in the cluster. We call it the manager’s “local” Engine. With that requirement, the Swarm manager would be passed a Docker socket so that it can directly query the Docker API at that engine. The workflow may look something like

./swarm manage -H 192.168.99.103:2376 unix:///var/run/docker.sock

This means the Swarm manager is listening for requests at 192.168.99.103:2376, and using the Docker socket of its “local” Engine for discovery. This may become the default way to run Swarm, as token based discovery is deprecated (#2743) and not considered safe for production.

Design Details

Here are some high-level design details for how this would be implemented.

When the Swarm manager starts up, it would query the Docker API to acquire the list of nodes along with their roles. This would allow it to get an idea of other managers and agents in the cluster.
This list should be stored in memory on the manager.
It would then start watching for SwarmKit events. The list of events is described here: API: Events stream moby/swarmkit#491 (comment). In particular, the relevant ones are
node join/remove
node promote/demote
node status change (Ready, Down)
node availability change (Active, Pause, Drain)
node leadership change
Based on events coming in, Swarm should update its view of the cluster in memory.

The new discovery backend will be based on the following Backend interface (https://github.com/moby/moby/blob/master/pkg/discovery/discovery.go):

// Backend is implemented by discovery backends which manage cluster entries.
type Backend interface {
        // Watcher must be provided by every backend.
        Watcher

        // Initialize the discovery with URIs, a heartbeat, a ttl and optional settings.
        Initialize(string, time.Duration, time.Duration, map[string]string) error

        // Register to the discovery.
        Register(string) error
}

Watcher is defined as

// Watcher provides watching over a cluster for nodes joining and leaving.
type Watcher interface {
        // Watch the discovery for entry changes.
        // Returns a channel that will receive changes or an error.
        // Providing a non-nil stopCh can be used to stop watching.
        Watch(stopCh <-chan struct{}) (<-chan Entries, <-chan error)
}

In this case, we need to define the Watcher based on the Docker socket, while the Register is not particularly relevant.

Leader Election

It is important to ensure that leader election can take place in a way that does not cause inconsistencies or race conditions in the cluster. SwarmKit is the source of truth here, and should also determine which manager node is the leader. The key requirement is that there should be a 1:1 correspondence between Swarm managers and SwarmKit managers, including their roles (Leader or not).

There are two competing ideas for leader election:

Swarm managers are subscribed to SwarmKit events, so they update their Leadership status based on received events. The upside is that this is low-overhead, but the downside is that there could be possible inconsistency if events to one or more managers are delayed. With this option, it would still be advisable to refresh membership information periodically (say every 30 seconds) by querying the Docker API.
A Swarm manager that is about to make an API call should first check if its “local” Engine is the leader. If not, it should delegate the request to a different manager (which will also do the same check). This ensures that a manager that isn’t a Leader will not make API calls to the Swarm cluster. The obvious downside is that there will be an extra API call to make each time, but as fewer operations depend on Swarm, this may not matter so much.

TLS

It looks like the discovery backend should be able to support TLS certs (moby/moby#16644). Any application that has access to SwarmKit certs should be able to provide those when starting up Swarm managers.

SwarmKit changes

As of now, it does not seem like any SwarmKit changes are necessary to make this proposal possible. The only requirement from SwarmKit is the ability to report full cluster information, both from the API and via Events, which exists currently.

Impact on Swarm Users

The key implication for the Swarm project and its users if we make these changes is that default discovery might become SwarmKit, which means it is required to have a 17.06 or newer “local” Engine backing Swarm managers. Of course, users of older Docker Engines will have to set up a consul or etcd cluster as discovery backends, so the old workflow will remain the same and there will be no impact or breakage for those users.

Timeline

This feature is targeted for the next major release: 1.3.0

The text was updated successfully, but these errors were encountered:

nishanttotla added area/discovery kind/proposal labels Jul 20, 2017

nishanttotla added this to the 1.3.0 milestone Jul 20, 2017

nishanttotla self-assigned this Jul 20, 2017

nishanttotla mentioned this issue Jul 20, 2017

Adding a SwarmKit based discovery backend #2775

Closed

4 tasks

justincormack closed this as completed Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Docker Swarm backed by SwarmKit #2774

Proposal: Docker Swarm backed by SwarmKit #2774

nishanttotla commented Jul 20, 2017

Proposal: Docker Swarm backed by SwarmKit #2774

Proposal: Docker Swarm backed by SwarmKit #2774

Comments

nishanttotla commented Jul 20, 2017

Proposal

High-level UX

Design Details

Leader Election

TLS

SwarmKit changes

Impact on Swarm Users

Timeline