Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use KinD clusters in integration tests for presubmits #2831

Merged
merged 5 commits into from
Mar 12, 2024

Conversation

rquitales
Copy link
Member

@rquitales rquitales commented Feb 16, 2024

Proposed changes

This PR switches the underlying test infrastructure to provision KinD clusters for PR checks. This will drastically speed up the feedback loop from the current ~45 mins to 25 mins. Furthermore, this also pushes the cluster creation step into the test job to enable re-running only the failed job should a test flake occur.

Changes made:

  • Updated the GHA workflow to run PR checks in Kind clusters to reduce feedback loop
    • Note: post-submit checks still run against a GKE cluster as some test cases can't don't pass in KinD
  • Updated existing tests to either work with Kind if they are simple, or to be skipped if the test needs a full cloud k8s cluster
  • Broke apart the Golang tests to run concurrently as GH Action workers now do not OOM
  • Fixed some hard-coded test logic that breaks on newer k8s versions

Related issues (optional)

Fixes: #2243

Copy link

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

Copy link

codecov bot commented Feb 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 26.84%. Comparing base (c59ddff) to head (6f8196d).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2831   +/-   ##
=======================================
  Coverage   26.84%   26.84%           
=======================================
  Files          53       53           
  Lines        7722     7722           
=======================================
  Hits         2073     2073           
  Misses       5476     5476           
  Partials      173      173           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rquitales rquitales force-pushed the rquitales/kind-clusters branch 2 times, most recently from 2df9f8f to fcfbbb4 Compare March 9, 2024 00:03
@rquitales rquitales requested review from EronWright and a team March 9, 2024 00:03
@rquitales rquitales force-pushed the rquitales/kind-clusters branch 5 times, most recently from 76a98bb to bd0f58d Compare March 12, 2024 01:29
@rquitales rquitales added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Mar 12, 2024
@rquitales rquitales force-pushed the rquitales/kind-clusters branch from 8cd5795 to c9924b4 Compare March 12, 2024 16:58
@EronWright
Copy link
Contributor

Some questions:

  1. Why is it better to have the test code create the cluster instead of a CI step?
  2. How/when will we support the new provider test framework and how many tests are being skipped?

Copy link
Contributor

@EronWright EronWright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Seems like we are exploiting the --short flag, not to "tell long-running tests to shorten their run time" but to filter incompatible tests.
Ideally we'd update the tests to be compatible, e.g. by using a clusterIP.

My main feedback is that I would prefer we keep cluster creation outside of the go test code, e.g. by using a GH action in the step immediately preceding test execution.

tests/clusters/gke.go Outdated Show resolved Hide resolved
tests/sdk/nodejs/nodejs_test.go Show resolved Hide resolved
tests/sdk/nodejs/nodejs_test.go Outdated Show resolved Hide resolved
@rquitales rquitales force-pushed the rquitales/kind-clusters branch from c9924b4 to c80f2f8 Compare March 12, 2024 19:53
@rquitales rquitales force-pushed the rquitales/kind-clusters branch from c80f2f8 to b945ece Compare March 12, 2024 19:55
@rquitales rquitales force-pushed the rquitales/kind-clusters branch from b945ece to 6f8196d Compare March 12, 2024 20:32
@rquitales
Copy link
Member Author

Some questions:

  1. Why is it better to have the test code create the cluster instead of a CI step?
  2. How/when will we support the new provider test framework and how many tests are being skipped?
  1. Having a Go test framework handle cluster creation ensures test portability and allows for a unified entrypoint. This removes the requirement for the developer to pre-configure their test environment to target different test environments. It'd be nice if I could simply just run go test --use-gke or go test --use-kind for example without needing to worry if I had set up these environments manually correctly beforehand.
  2. The new provider test framework was supported in the last commit.

After offline discussions, I've decided to opt for cluster creation in the CI job instead for now and parking this approach. We might revisit this in the future.

@rquitales
Copy link
Member Author

Nit: Seems like we are exploiting the --short flag, not to "tell long-running tests to shorten their run time" but to filter incompatible tests. Ideally we'd update the tests to be compatible, e.g. by using a clusterIP.

My main feedback is that I would prefer we keep cluster creation outside of the go test code, e.g. by using a GH action in the step immediately preceding test execution.

Fwiw, most of the tests that are skipped due to not being passable in KinD in their current state are also long running tests.

@rquitales rquitales changed the title Use KinD clusters in integration tests Use KinD clusters in integration tests for presubmits Mar 12, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: changes in this file will need to be replicated in ci-mgmt. It is included here for faster feedback loop.

Copy link
Contributor

@EronWright EronWright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rquitales rquitales merged commit 101fcfb into master Mar 12, 2024
18 checks passed
@rquitales rquitales deleted the rquitales/kind-clusters branch March 12, 2024 22:17
rquitales added a commit that referenced this pull request Mar 13, 2024
### Proposed changes

This PR switches the underlying test infrastructure to provision KinD
clusters for PR checks. This will drastically speed up the feedback loop
from the current ~45 mins to 25 mins. Furthermore, this also pushes the
cluster creation step into the test job to enable re-running only the
failed job should a test flake occur.

## Changes made:

- Updated the GHA workflow to run PR checks in Kind clusters to reduce
feedback loop
- Note: post-submit checks still run against a GKE cluster as some test
cases can't don't pass in KinD
- Updated existing tests to either work with Kind if they are simple, or
to be skipped if the test needs a full cloud k8s cluster
- Broke apart the Golang tests to run concurrently as GH Action workers
now do not OOM
- Fixed some hard-coded test logic that breaks on newer k8s versions

### Related issues (optional)

Fixes: #2243
(cherry picked from commit 101fcfb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/no-changelog-required This issue doesn't require a CHANGELOG update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

It's expensive to re-run integration tests
2 participants