Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RHOAIENG-15141] Fix failing E2E tests #473

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

jstourac
Copy link
Member

There were two E2E tests in failure:
--- FAIL: TestE2ENotebookController/update/thoth-minimal-oauth-notebook/Notebook_Statefulset_Validation_After_Update (60.28s)
--- FAIL: TestE2ENotebookController/update/thoth-minimal-oauth-notebook/Verify_Notebook_Traffic_After_Update (10.75s)

The reason was that after the culling test, the culling configuration is reverted, but the Notebook CR isn't updated and the annotation that was added by the notebook controller: kubeflow-resource-stopped stays in that CR and as such, the workbench instance isn't started back. This change deletes this annotation at the end of the culler test and the workbench is up and running and ready for the followup tests.

Another issue there was that the update test updated the workbench to a non-existent image link, which resulted in a ImagePullBackOff Error in the end. This change updates this link to some existing image.


https://issues.redhat.com/browse/RHOAIENG-15141

How Has This Been Tested?

You can try to run locally e.g. by this:

~/workspace/rhosai/odh/kubeflow/components/odh-notebook-controller on  fixE2eTests! ⌚ 14:59:21
$ make -e ODH_NOTEBOOK_CONTROLLER_IMAGE=quay.io/opendatahub/odh-notebook-controller:main-f1be2a4 -e KF_NOTEBOOK_CONTROLLER=quay.io/opendatahub/kubeflow-notebook-controller:main-f1be2a4 run-ci-e2e-tests

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@jstourac jstourac self-assigned this Nov 27, 2024
@openshift-ci openshift-ci bot requested review from dibryant and paulovmr November 27, 2024 14:01
@@ -65,7 +65,9 @@ func (tc *testContext) testNotebookUpdate(nbContext notebookContext) error {
}

// Example update: Change the Notebook image
updatedNotebook.Spec.Template.Spec.Containers[0].Image = "new-image:latest"
// newImage := "new-image:latest" quay.io/thoth-station/s2i-minimal-notebook:v0.2.2
newImage := "quay.io/opendatahub/workbench-images:jupyter-minimal-ubi9-python-3.11-20241119-3ceb400"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here: I deliberately used our own workbench repository for this. But for the original first workbench, there is used that quay.io/thoth-station/s2i-minimal-notebook... soo... let me know if we want that instead for some reason 🤷

Copy link
Member

@jiridanek jiridanek Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put toth into the github actions when I was touching these things. I guess it does not really matter, so let's go with your choice here, lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the images must be consistent with whatever the readiness / and-the-other-one probes that are defined on notebook CR, guess that's pretty much it and anything else would work; maybe that the entrypoint runs indefinitely, that may be also important, that the image does not exit on its own (or does not exit quickly)

@jstourac jstourac requested a review from jiridanek November 27, 2024 14:10
@jstourac
Copy link
Member Author

The linter GHA failure is valid, but we use this method on multiple places there and I just moved it's call to a different place - so actually no change in regards to this. We shall think about get rid of these deprecated calls in the future though.

@jiridanek
Copy link
Member

The linter GHA failure is valid, but we use this method on multiple places there and I just moved it's call to a different place - so actually no change in regards to this. We shall think about get rid of these deprecated calls in the future though.

I would've fixed it here, but I guess it's just me. I can approve with linter error present.

@jstourac
Copy link
Member Author

ci/prow/odh-notebook-controller-e2e — Job succeeded.

🎉

Copy link
Member

@jiridanek jiridanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove some of the old commented out code, the few lines about new-image:latest, and I think its good

let's see if anyone else wants to review in the pre-us-holidays rush

@jstourac
Copy link
Member Author

just remove some of the old commented out code, the few lines about new-image:latest, and I think its good

done

There were two E2E tests in failure:
            --- FAIL: TestE2ENotebookController/update/thoth-minimal-oauth-notebook/Notebook_Statefulset_Validation_After_Update (60.28s)
            --- FAIL: TestE2ENotebookController/update/thoth-minimal-oauth-notebook/Verify_Notebook_Traffic_After_Update (10.75s)

The reason was that after the culling test, the culling configuration is
reverted, but the Notebook CR isn't updated and the annotation that was
added by the notebook controller: `kubeflow-resource-stopped` stays in
that CR and as such, the workbench instance isn't started back. This
change deletes this annotation at the end of the culler test and the
workbench is up and running and ready for the followup tests.

Another issue there was that the update test updated the workbench to a
non-existent image link, which resulted in a ImagePullBackOff Error in
the end. This change updates this link to some existing image.
@jiridanek
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 27, 2024
@jstourac
Copy link
Member Author

I also rebased this now - so we can see how this works on the latest changes from main branch.

@atheo89
Copy link
Member

atheo89 commented Nov 27, 2024

/lgtm

@jstourac
Copy link
Member Author

Thank you for all your reviews, guys. Let's move this in now!

/approve

Copy link

openshift-ci bot commented Nov 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jstourac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jstourac
Copy link
Member Author

/override golangci-lint (components/odh-notebook-controller)

Copy link

openshift-ci bot commented Nov 27, 2024

@jstourac: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • (components/odh-notebook-controller)
  • golangci-lint

Only the following failed contexts/checkruns were expected:

  • ci/prow/images
  • ci/prow/kf-notebook-controller-pr-image-mirror
  • ci/prow/odh-notebook-controller-e2e
  • ci/prow/odh-notebook-controller-pr-image-mirror
  • ci/prow/odh-notebook-controller-unit
  • golangci-lint (components/odh-notebook-controller)
  • pull-ci-opendatahub-io-kubeflow-main-images
  • pull-ci-opendatahub-io-kubeflow-main-kf-notebook-controller-pr-image-mirror
  • pull-ci-opendatahub-io-kubeflow-main-odh-notebook-controller-e2e
  • pull-ci-opendatahub-io-kubeflow-main-odh-notebook-controller-pr-image-mirror
  • pull-ci-opendatahub-io-kubeflow-main-odh-notebook-controller-unit
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override golangci-lint (components/odh-notebook-controller)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jstourac
Copy link
Member Author

/override "golangci-lint (components/odh-notebook-controller)"

Copy link

openshift-ci bot commented Nov 27, 2024

@jstourac: Overrode contexts on behalf of jstourac: golangci-lint (components/odh-notebook-controller)

In response to this:

/override "golangci-lint (components/odh-notebook-controller)"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit fc5ffd4 into opendatahub-io:main Nov 27, 2024
12 of 13 checks passed
@jstourac jstourac deleted the fixE2eTests branch November 27, 2024 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants