Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-7610: fix(nbc): deduplicate culling-related configmap keys #480

Merged

Conversation

jiridanek
Copy link
Member

@jiridanek jiridanek commented Dec 3, 2024

https://issues.redhat.com/browse/RHOAIENG-7610

Description

As described in product documentation [1], culling config is given in

The notebook-controller-culler-config ConfigMap, located in the redhat-ods-applications project

However, fresh RHOAI installation also defines the same keys in the notebook-controller-config ConfigMap in the same namespace. Only keys from notebook-controller-culler-config are mounted into the deployment, the culling keys from the other configmap are ignored.

Therefore, we need to remove these unused duplicated keys, to prevent confusion

ENABLE_CULLING=false
CULL_IDLE_TIME=1440
IDLENESS_CHECK_PERIOD=1

How Has This Been Tested?

      devFlags:
        manifests:
          - contextDir: components/notebook-controller/config
            uri: 'https://github.com/jiridanek/kubeflow/tarball/jd_duplicate_parames'

with that, after I deleted the configmap that was already present,

% oc get configmap/notebook-controller-config -n redhat-ods-applications -o yaml
apiVersion: v1
data:
  ADD_FSGROUP: "false"
  CLUSTER_DOMAIN: cluster.local
  ISTIO_GATEWAY: kubeflow/kubeflow-gateway
  ISTIO_HOST: '*'
  USE_ISTIO: "false"
kind: ConfigMap
metadata:
  annotations:
    internal.config.kubernetes.io/previousKinds: ConfigMap
    internal.config.kubernetes.io/previousNames: notebook-controller-config
    internal.config.kubernetes.io/previousNamespaces: opendatahub
  creationTimestamp: "2024-12-03T07:14:32Z"
  labels:
    app: notebook-controller
    app.kubernetes.io/part-of: workbenches
    app.opendatahub.io/workbenches: "true"
    component.opendatahub.io/name: kf-notebook-controller
    kustomize.component: notebook-controller
    opendatahub.io/component: "true"
  name: notebook-controller-config
  namespace: redhat-ods-applications
  ownerReferences:
  - apiVersion: datasciencecluster.opendatahub.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: DataScienceCluster
    name: default-dsc
    uid: d30ef936-c4cf-456e-8521-8e96e0a1dca5
  resourceVersion: "67258696"
  uid: 1f3155f6-d385-4344-bd2d-42acc0dff6ad

Considerations

Do we want to force the updated configmap on RHOAI upgrade, if yes, then we either require users to delete the old configmap, or we need operator code that will ensure overwrite.

My opinion is that we don't want to do this and we'll only set the modified configmap for fresh installs.

Operator restart ensures overwrite, all right. It's just that changing devFlags alone is not enough to affect change in the configmap.

Notes

Saw this repeating error message in notebook-controller logs from code-server workbenches. This kept happening even after I removed the devFlag and restarted the rhods operator (!) to restore the configmap to its original RHOAI 2.16 content.

2024-12-03T07:12:46Z	ERROR	controllers.Culler	Error talking to http://vsc-old.jdanek2.svc.cluster.local/notebook/jdanek2/vsc-old/api/terminals	{"error": "Get \"http://vsc-old.jdanek2.svc.cluster.local:8888/codeserver/healthz/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
github.com/kubeflow/kubeflow/components/notebook-controller/controllers.getNotebookResourceResponse
	/workspace/notebook-controller/controllers/culling_controller.go:226
github.com/kubeflow/kubeflow/components/notebook-controller/controllers.getNotebookApiTerminals
	/workspace/notebook-controller/controllers/culling_controller.go:264
github.com/kubeflow/kubeflow/components/notebook-controller/controllers.updateNotebookLastActivityAnnotation
	/workspace/notebook-controller/controllers/culling_controller.go:325
github.com/kubeflow/kubeflow/components/notebook-controller/controllers.(*CullingReconciler).Reconcile
	/workspace/notebook-controller/controllers/culling_controller.go:147
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/opt/app-root/src/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227

Setting notebooks.kubeflow.org/last-activity: '2024-12-03T07:35:25Z' for Jupyterlab notebooks works just fine.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@atheo89
Copy link
Member

atheo89 commented Dec 5, 2024

/lgtm

@jiridanek
Copy link
Member Author

/approve

Copy link

openshift-ci bot commented Dec 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiridanek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 5, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 8dc9aed into opendatahub-io:main Dec 5, 2024
14 checks passed
@jiridanek jiridanek deleted the jd_duplicate_parames branch December 5, 2024 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants