Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics 2/x] Configure Prometheus Operator #687

Merged

Conversation

zeeke
Copy link
Member

@zeeke zeeke commented Apr 23, 2024

Deploy the needed configuration to make the prometheus
operator to find and scrape the sriov-network-metrics-exporter
endpoints, including the ServiceMonitor, Role and RoleBinding

depends on:

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke changed the title [metrics 2/3] Configure Prometheus Operator [metrics 2/x] Configure Prometheus Operator Apr 23, 2024
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 06e880c to 75e8305 Compare April 24, 2024 15:51
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 75e8305 to 06c86e1 Compare April 24, 2024 16:51
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Apr 24, 2024

Pull Request Test Coverage Report for Build 9903758109

Details

  • 12 of 16 (75.0%) changed or added relevant lines in 1 file are covered.
  • 98 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.02%) to 41.0%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/sriovoperatorconfig_controller.go 12 16 75.0%
Files with Coverage Reduction New Missed Lines %
controllers/generic_network_controller.go 2 76.4%
controllers/sriovoperatorconfig_controller.go 34 64.19%
controllers/helper.go 62 68.75%
Totals Coverage Status
Change from base Build 9894164129: 0.02%
Covered Lines: 5599
Relevant Lines: 13656

💛 - Coveralls

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 06c86e1 to 37fbdf4 Compare April 24, 2024 17:06
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 37fbdf4 to 7999f7e Compare May 17, 2024 16:31
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 7999f7e to 58b9fb3 Compare May 17, 2024 16:53
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@@ -18,6 +18,7 @@ spec:
name: sriov-network-metrics
port: {{ .MetricsExporterPort }}
targetPort: {{ .MetricsExporterPort }}
{{ if .IsOpenshift }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we can also support k8s

lets check if the ServiceMonitor CRD exist in the cluster and deploy it instead of checking only for openshift WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, working on that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I used the unstructured client to check if the ServiceMonitore resource definition is available in the cluster. Does it sound good?

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 58b9fb3 to 9b5e4cf Compare May 28, 2024 07:22
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 9b5e4cf to 2bddf42 Compare May 28, 2024 08:10
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 2bddf42 to cec70bd Compare May 28, 2024 08:12
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from cec70bd to 6da499a Compare May 28, 2024 08:16
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 6da499a to e87a82d Compare May 28, 2024 09:29
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

deploy/role.yaml Outdated
@@ -32,6 +32,7 @@ rules:
verbs:
- get
- create
- list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this one also under the config folder so it will be generated for the bundle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the contents of the config/ folder is currently out of sync with openshit/sriov-network-operator (e.g. see config/rbac/role.yaml here and here).

I would align that in a separated PR. Does it sound good?


func isPrometheusOperatorInstalled(ctx context.Context, client k8sclient.Reader) bool {
u := &uns.UnstructuredList{}
u.SetGroupVersionKind(schema.GroupVersionKind{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking (maybe that is not the right way) to do a kubectl get crd servicemonitor not to search if there is any server monitor object in the cluster :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can try getting the CRD (e.g. see this).

The drawback is that we have to add the permission (ClusterRole,ClusterRoleBinding,...) to make the operator read that CustomResourceDefinition resource, but it might end up cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be ok to add get for CRD that should not expose the operator to any security issues :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add the permission to the ClusterRole (instead of Role), as the CustomResourceDefinition is not namespaced.
For the same reason, I had to add a non-namespace client to the SriovOperatorConfigReconicler.
please, take a look

}

if r.PlatformHelper.IsOpenshiftCluster() {
err = utils.AddLabelToNamespace(ctx, vars.Namespace, "openshift.io/cluster-monitoring", "true", r.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to put this in the namespace creation template and not let the operator have permission to upgrade namespace that sounds a security risk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Maybe we can leverage the operatorframework.io/cluster-monitoring annotation in the CSV, in the openshift fork

https://github.com/openshift/enhancements/blob/master/enhancements/olm/olm-managed-operator-metrics.md#fulfilling-namespace-and-rbac-requirements

@adrianchiris
Copy link
Collaborator

@zeeke can you rebase this one ?

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from e87a82d to 17dabd0 Compare June 27, 2024 09:33
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 7aacaae to 2faab1d Compare July 3, 2024 15:34
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 2faab1d to 19e5be7 Compare July 3, 2024 16:04
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke removed the hold label Jul 5, 2024
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 19e5be7 to 41106a5 Compare July 5, 2024 08:47
Copy link

github-actions bot commented Jul 5, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke
Copy link
Member Author

zeeke commented Jul 5, 2024

@adrianchiris , @SchSeba please take another look

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some small comments :)

subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good for openshift on vanilla k8s please add a variable to the helmchart something like
https://github.com/metallb/metallb/blob/21dd75560f3b8614c14b1bb55a79dbcc231e36a7/charts/metallb/templates/servicemonitor.yaml#L192

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added environment variables and Helm stuff to make the subject configurable.

deploy/clusterrole.yaml Outdated Show resolved Hide resolved
@@ -161,3 +161,28 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func AddLabelToNamespace(ctx context.Context, namespaceName, key, value string, c client.Client) error {
ns := &corev1.Namespace{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to find where we use this one.

in general I think we should document the need to add a label for monitoring on namespace creation and not add a rbac to allow the operator to update namespace object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the function. I will re-add it in the e2e test PR.

About permissions, the operator already had the RBAC to write on namespaces (see deploy/clusterrole.yaml and openshift CSV). No permission has been added in this PR for namespaces.

I'll take care of documenting the namespace configuration in OpenShift

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from a39a440 to 108ef15 Compare July 12, 2024 06:55
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke
Copy link
Member Author

zeeke commented Jul 16, 2024

@adrianchiris , @SchSeba please take another look

hack/env.sh Outdated
@@ -41,3 +41,5 @@ export DEV_MODE=${DEV_MODE:-"FALSE"}
export OPERATOR_LEADER_ELECTION_ENABLE=${OPERATOR_LEADER_ELECTION_ENABLE:-"false"}
export METRICS_EXPORTER_SECRET_NAME=${METRICS_EXPORTER_SECRET_NAME:-"metrics-exporter-cert"}
export METRICS_EXPORTER_PORT=${METRICS_EXPORTER_PORT:-"9110"}
export METRICS_EXPORTER_PROMETHEUS_OPERATOR_SERVICE_ACCOUNT=${METRICS_EXPORTER_PROMETHEUS_OPERATOR_SERVICE_ACCOUNT:-"prometheus-k8s"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to override this one in the openshift CI file no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, moving there

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch 2 times, most recently from 19cc04e to 83801a5 Compare July 17, 2024 17:19
@zeeke
Copy link
Member Author

zeeke commented Jul 18, 2024

I got rid of the CustomResourceDefinition clusterrole access and now the installation of the Prometheus operator is inferred by the presence of the METRICS_EXPORTER_PROMETHEUS_OPERATOR_NAMESPACE environment variable.

@adrianchiris @SchSeba please take another look

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! much cleaner now thanks for working on this!

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 83801a5 to 4af0a6a Compare July 29, 2024 07:31
zeeke added 2 commits July 29, 2024 09:33
Package `github.com/prometheus-operator/prometheus-operator/pkg/client`
can be used for testing purpose.

Signed-off-by: Andrea Panattoni <[email protected]>
Deploy the needed configuration to make the prometheus
operator to find and scrape the sriov-network-metrics-exporter
endpoints, including the ServiceMonitor, Role and RoleBinding.

Resources are installed only if the Prometheus operator is installed.

When useing `ServiceMonitors`, Prometheus Operator needs permissions
to read Services,Endpoint and Pods in the monitored namespace (i.e. the SRIOV
operator ns).

Make the ServiceAccount subject configurable via environment variables.

Signed-off-by: Andrea Panattoni <[email protected]>
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 4af0a6a to 3dff029 Compare July 29, 2024 07:34
@zeeke zeeke requested review from adrianchiris and SchSeba July 30, 2024 16:34
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small nit

@@ -137,6 +138,10 @@ var _ = BeforeSuite(func() {
Expect(err).NotTo(HaveOccurred())
err = os.Setenv("METRICS_EXPORTER_KUBE_RBAC_PROXY_IMAGE", "mock-image")
Expect(err).NotTo(HaveOccurred())
err = os.Setenv("METRICS_EXPORTER_PROMETHEUS_OPERATOR_SERVICE_ACCOUNT", "k8s-prometheus")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need the new variable here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack/run-e2e-conformance-virtual-ocp.sh Show resolved Hide resolved
Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGtM!

@adrianchiris adrianchiris merged commit 57e1e90 into k8snetworkplumbingwg:master Jul 31, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants