Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API-1825: describe multi-operator-manager #1721

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions dev-guide/multi-operator-manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# MultiOperatorManager

## Summary

HCP and standalone topologies face a problem of inconsistency and duplicate effort to introduce features
because they are two entirely distinct mechanisms for managing the same operands.
While some differences are absolutely required (deployments vs static pods), many are simply capricous.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: capricious

This is about finding a way to safely refactor ourselves into a single path for managing these operands.

## Motivation

The duplicate effort, delays caused by that duplicate effort, and fragmentation of the ecosystem as
OCP supports different features in same version of different form factors is painful for customers,
platform extension authors, and developers.
Additionally, it is impractical to expect HCP to be an expert in managing (install, configure, and update)
every operand on the management cluster.
Finding way to keep operand teams aligned to how the operand is managed in every form factor is beneficial,
finding a way to do this without simply requiring more effort is even better, and providing a mechanism to
improve testability at the same time is better still.
This enhancement aims at improving all of those.

### Goals

* Keep teams aligned to units of function across all form-factors (desired by everyone I think)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence, wdym?

* Dramatically reduce overhead of running operators (single-node and HCP constraints)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be more specific about what overheads we have and the listed platforms?

* Dramatically reduce the delta between standalone and hypershift (all of us feel this pain)
* Dramatically improve testability of operators (all of us feel this pain)

### Non-Goals

* Reimplement operators in some meaningful way. We're shooting for a succesive refactor.
* Change any customer facing API

## Proposal

### Definition of Terms
Before we get into the changes, we need to define some terms.

1. configuration cluster - this is a theoretical (doesn't really exist) kuberentes cluster that contains
deads2k marked this conversation as resolved.
Show resolved Hide resolved
the configuration that the user provides and the operators interact with.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we disambiguate "the user" by using specific Personas? e.g. is this is user a cluster admin, a service consumer, service provider, a developer, ...?

This is mostly where config.openshift.io is, but can contain other things.
Recall that on standalone this is the one cluster, but on HCP will be the management cluster and in future
if we ever create a separate configuration server in standalone, it will live there.
2. management cluster - this is the cluster where any non-config resource that exists in the HCP management
cluster needs to be listed.
Standalone doesn't have this concept, but we need to indicate which deployments (and the config leading to them)
need to exist in the HCP management cluster.
3. user workload cluster - this is the cluster where users run their workloads.
Many operators run things like daemonsets in this category.
For instance, CSI driver daemonsets, CNI daemonsets, DNS daemonsets, image registry deployment, etc.

### MultiOperatorManager
Currently, every operator establishes its own client connections to the kube-apiserver, fills its own caches,
issues its own updates, and is permanently running to be responsive.
Instead of running each operator individually, we will run one binary (the MOM) that can communicate to the kube-apiserver.
It will have a single set of caches and it will decide which operators need to react to changes.
To make this decision, it will know what resources those operators depend upon, but can also rate limit by
1. debouncing - waiting a few seconds after the first input change to collect more.
2. maximum concurrency - only X operators may be working at the same time
3. cool down - an operator must wait at least M seconds before being called again
4. operator qps - a single operator can only make N changes per minute
5. any other metric we like.

This allows us to limit the peak memory usage and server QPS.
It also allows us to detect conflicts between operator and eliminate hot looping.

The refactor that makes this possible also makes it possible to easily expand our test coverage and reuse the output
of standalone operators on HCP.

### The refactor
We will refactor existing operators to continue running as they do today and also fulfill the
contract of these new commands.

1. `operator input-resources` - outputs a list of kube API input resources that this operator needs
Copy link
Member

@enxebre enxebre Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to pick a concrete use case for a concrete operator and flesh out the e2e journey for the new contract for both scenarios standalone and hypershift

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/cluster-authentication-operator now has input-resources, output-resources, and apply-configuration commands that function (not yet guaranteed compatible)

2. `operator output-resources` - outputs a mapping of kube API output resources from this operator
and whether these resources are for configuration, management, or user workload clusters.
3. `operator apply-configuration --input-dir=<must-gather-like-dir> --output-dir=<dir which-will-list-content-to-write>` -
takes a must-gather like directory of input and produces an output directory containing content to apply to the cluster.
Logically, the output is something you would `kubectl create -f <output-dir>/create`, `kubectl apply -f <output-dir>/apply`,
etc for each mutation verb.
This command is not provided any kubeconfig and has all network access cut off.
Doing this makes for something extremely easy to integration test.
4. `operator apply-configuration-live --input-dir=<must-gather-like-dir> --output-dir=<dir which-will-list-content-to-write>` -
similar to apply-configuration, but this one will get a kubeconfig file to the user-workload cluster and limited
(determined by TBD proxy configuration) to reach out to other endpoints.
This is useful for cases where external access is necessary, like to a cloud provider or an external identity provider.
These commands are encouraged to write back enough to the configuration cluster so that testing flows are very
easy for each command.
We do not yet know how we'll get a NO_PROXY configuration for each operator.
5. TBD - some way to determine the service account to use for mutation for each operator.

### Prereqs to such a refactor
There are several pieces that are needed prior to investing in a refactor like this across multiple teams.

1. Ability to read a must-gather and feed a functional fake client that can support informers.
2. Ability to track writes using a client and serialize them for later.
3. Types to define input and output resources.
4. Ability to prune a must-gather to just the content needed for feeding an operator.
5. Ability to declaratively define an operator test for verification.
6. Ability to avoid updating the same resource multiple times (the perma-conflict problem).
7. Sufficient refactoring to avoid the obvious cases of perma-conflict.
8. Ability to know which service account to use for each operator (out of band?).
9. Easy Makefile inclusion of integration testing.
10. Refactor at least one operator to demonstrate this actually works.
11. MOM to enforce HTTP_PROXY rules on operator.
12. IndividualMOM created that is able to handle each request type.
13. Stretch: refactor cluster-authentication-operator to use IndividualMOM

### Suggested flow in managing the refactor
In order to avoid disrupting our existing OCP standalone product, it is important to refactor for equivalence
instead of simply writing something entirely new.
To support this we have a pattern that looks roughly like this.
1. Construct all the clients it will require in one function and returns a struct containing all those clients.
2. From the clients, create all the informers that are required.
3. From the informers, initialize all the control loops for the operator.
4. For each control loop, add a `RunOnce` function.
5. Create an alternative for initial clients built from `manifestclient.MutationTrackingClient`.
6. Wire `libraryapplyconfiguration.NewApplyConfigurationCommand` using the example in MOM.
7. Add `targets/openshift/operator/mom.mk` to the `Makefile` to get `test-operator-integration` and `update-test-operator-integration` targets.

Doing this
1. Leaves the existing standalone operator more or less untouched.
2. Ensures we don't miss control loops in the `RunOnce`.
3. Provides a testing mechanism even for the standalone control loops that is useful for establishing behavior.

### Workflow Description

### API Extensions

### Topology Considerations

#### Hypershift / Hosted Control Planes

##### HCP namespaces
Operators think in terms of standalone and this will remain for the foreseeable future.
HCP will need to adjust the input-resources and output-resources from standalone into something suitable for HCP.
This is a risk calculation.
Because every namespace, name, configmap, and secret are fully exposed in a standalone cluster,
we have no way of safely making sweeping changes to those end users, cluster-admins, and platform extensions
that rely on those names being stable.
HCP does not suffer from a comparable problem and can thus migrate to closer alignment with standalone
comparatively easily.

A key part of this simulated namespaces as operators expect to exist in a single cluster with many
well-known namespaces.

#### Standalone Clusters

#### Single-node Deployments or MicroShift

### Implementation Details/Notes/Constraints

### Risks and Mitigations

1. Mrunal is concerned that spawning processes is too expensive.
This will be tested by comparing the cost of running operators in standard mode versus MOM mode amortized over 24h.
Cost to take into account a weighting of the kube-apiserver load, peak memory usage, memory GB-seconds, and potentially more.
Conclusions to consider the amortized cost of combining ~10 resident operators to one MOM.
2.

### Drawbacks

## Open Questions [optional]

## Test Plan

## Graduation Criteria

### Dev Preview -> Tech Preview

### Tech Preview -> GA

### Removing a deprecated feature

## Upgrade / Downgrade Strategy

## Version Skew Strategy

## Operational Aspects of API Extensions

## Support Procedures

## Alternatives

## Infrastructure Needed [optional]