Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure tests to take advantage of kitchen-terraform #20

Merged
merged 1 commit into from
Oct 31, 2018

Conversation

Jberlinsky
Copy link
Contributor

Restructures tests to take advantage of kitchen-terraform, and eliminate the use of bash scripting for test fixture generation.

@Jberlinsky Jberlinsky force-pushed the feature/restructure-tests branch from e537811 to 48d0cab Compare October 23, 2018 15:42
@Jberlinsky Jberlinsky force-pushed the feature/restructure-tests branch from 48d0cab to 6162310 Compare October 23, 2018 15:43
@Jberlinsky Jberlinsky requested review from morgante and lilithmooncohen and removed request for morgante October 23, 2018 15:43
@morgante morgante requested a review from adrienthebo October 26, 2018 15:44
@wyardley
Copy link
Contributor

👍
I'm curious also, why does this module use gcloud CLI vs. inspec-gcp to do all the checks? It feels very "Googly" to me 😉

@adrienthebo
Copy link
Contributor

@wyardley that's a good question! A lot of the Terraform modules in this organization originally used BATS, so it was straightforward to migrate the existing tests to Test Kitchen and InSpec by shelling out and using the command resources. Changing test frameworks is a lot of work as it stands, so moving to test-kitchen while shelling out to gcloud provides a lot of value by itself. Switching to inspec-gcp would be cleaner and more elegant, but in terms of validating correctness shelling out to gcloud provides around the same level of verification as would using inspec-gcp - but with less up front work

In some situations and modules we could use inspec-gcp, but in some cases we need test coverage for features that are exposed by gcloud but aren't yet supported by inspec-gcp. For instance GCP subnet IAM bindings/policies can be fetched by gcloud but inspec-gcp doesn't yet support them, and gcloud can enumerate activated APIs but there's no matching inspec-gcp resource. (I've been meaning to submit PRs for these features but haven't found the time to do so.) In this scenario it would be confusing to use inspec-gcp resources for some tests and shell out to gcloud for other tests.

In the long term I expect/hope that we'll move to inspec-gcp resources - it's a great project and I've had a lot of luck using it in smaller test environments with less surface area. However in the short run we aren't able to overhaul all of our tests to use test-kitchen, implement all of the inspec-gcp resources that we need, and do so while shipping new features for the modules themselves.

Does that answer your question?

Copy link
Contributor

@adrienthebo adrienthebo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks functionally correct and reads clearly. There are some test details leaking in that I've noted, they're not blockers but I think we'd benefit from comments clarifying their purpose.

I'll pull down this code in the next day or to and perform functional review on this PR; if that works out well I'll 👍.

@@ -14,23 +14,61 @@
* limitations under the License.
*/

output "name_example" {
output "project_id" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're basing our test cases off of the examples, I think we should document why we're using these passthru outputs. People just looking at these examples might be confused/surprised otherwise.

@@ -14,31 +14,56 @@
* limitations under the License.
*/

output "name_example" {
output "project_id" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document passthrough outputs in example code.

Alternately, we could extract this into a secondary variables file - perhaps outputs-test.tf or somesuch? Making that change may be out of scope for this PR but an idea to think about in the future.

description = "Cluster location"
value = "${module.gke.location}"
output "ca_certificate" {
sensitive = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certificates shouldn't need to be sensitive, only private keys - does this output contain any key material?

sensitive = true
description = "Cluster endpoint"
value = "${module.gke.endpoint}"
output "network" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these passthrough variables consistent across all examples?

@wyardley
Copy link
Contributor

@adrienthebo Absolutely! Super helpful... thanks. I was just confused when I saw the project was using inspec in the README but then noticed all those shell calls in the actual tests.

I had the same thought, that inspec-gcp might not have all of those resources available yet. I tested out for something I was working on (using some of the plumbing from here as an example), and it definitely seems really fast.

I do wish there were some better frameworks for doing expectation based unit tests of tf, vs only being able to do integration tests, but so far, I haven't found much out there.

@adrienthebo
Copy link
Contributor

I've run into #24 while testing this PR. Since this PR didn't introduce the issue we can merge anyways, but should address this soon.

@adrienthebo
Copy link
Contributor

I'm running into this error with the deploy-service-local and simple-zonal-local test instances, @Jberlinsky have you run into this before?

Error:

       * google_container_node_pool.zonal_pools: error creating NodePool: googleapi: Error 400: The required IP ranges for pods can't be fulfilled by the given container secondary range., badRequest

@Jberlinsky
Copy link
Contributor Author

@adrienthebo What network(s)/subnet(s) are you trying to deploy the test cases into? Can you share that part of your terraform.tfvars? What's the output of gcloud beta container subnets list-usable with respect to those networks/subnets?

In general, I think this ties in with a conversation @morgante and I were having about test fixtures. Personally, I think it would be best to spin up a fixture network, with appropriate subnets, before the tests are run, and pass those subnets in dynamically. This would reduce the amount of information that needs to be conveyed to test-runners (e.g. the CIDR size restrictions for GKE clusters, which would be codified in the fixture setup), and reduce the likelihood of issues when running tests.

@adrienthebo
Copy link
Contributor

I've documented my config in #25, and as far as I can tell /20s should have been big enough. Manually reducing the max count in the node pool from 100 to 10 has unblocked me, but that's not a viable fix to this. I'm going to keep running through this PR.

@adrienthebo
Copy link
Contributor

I've run into a number of issues while verifying this PR, most of which are due to the early state of testing for these modules. For the sake of posterity I'm going to run through

  1. The requirements/fixtures for the modules aren't listed (networks, subnet names, secondary ranges, secondary range sizes)
  2. The Docker images cannot be rebuilt (Alpine Linux dropped git=2.18.10-r0 in favor of git=2.18.1-r0
  3. An overly small subnet size can cause the node pool resources to fail, but it's hard to deduce that the secondary network size is at fault.
  4. There's no particularly good way (that I've found) to deduce how big the secondary networks need to be.
  5. The min_master_version value is too tightly specified (`1.10.6-gke.2") and the default value is no longer accepted.
  6. Credentials aren't passed from Terraform to InSpec; when running kitchen verify within Docker gcloud fails because it isn't authorized.

The noteworthy bit is that out of the above issues, only the last is really related to these changes. This puts us in an unfortunate state where the changes in the PR are reasonable, but the tests take an enormous amount of work and small tweaks to run.

I see two approaches forward:

  1. Block this pull request until everything is fixed. This will cause this pull request to balloon in scope, prevent incremental progress in other areas, but it follows best practices of only merging PRs that are fully functional.
  2. Merge this pull request to get the improvements it does have, and then work through a number of smaller pull requests to fix the issues that I've outlined above. This has the downside of merging code into what amounts to a broken build which is pretty uncomfortable, but if we're diligent we can use this as incremental progress and build on that.

My inclination is that we spring for the second case, accept that we have a lot to work on, and focus on getting changes submitted and merged. We've been working on variations of tests for this module for a while but we haven't landed much code. I'd like to propose lowering the threshold for merging tests that are non-functional (only affecting tests) so that we can iterate much faster and keep up with the issues that are popping up.

@morgante
Copy link
Contributor

@adrienthebo Thanks for the detailed review. I sympathize with the danger of having long-running PRs but am also reticent to merge in tests which are functionally impossible to run.

I suggest that the immediate first step is simply to document the requirements for running tests (even if it's just posting a working Terraform config into an issue and linking to that). We can then do follow-on work to improve the tests.

Next steps:

  1. Make clear README instructions for running the tests (clearly specifying upfront work you need to do)
  2. Open issues for all other problems found to make sure we circle back and fix them

@Jberlinsky I'm fine with having the cluster auto-create networks as part of the test fixtures, since that's comparatively much less expensive than creating projects.

@Jberlinsky
Copy link
Contributor Author

@adrienthebo Thanks for the detailed review, and thanks for your thoughts, @morgante.

My preference here is to establish precedent for having the tests bootstrap networking fixtures. I'll work on that, in the process making sure that enough of the identified issues are knocked out that the tests can functionally run, and circle back here shortly.

@@ -0,0 +1,8 @@
project_id=""
credentials_path="../../../credentials.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path appears to be relative to test/fixtures/node_pool, but the other examples use ../.. which I think is relative to examples/{example}. Can we fix this up?

@morgante
Copy link
Contributor

I accidentally merged this too soon.

@pratikmallya
Copy link
Contributor

Will this PR be reopened? @morgante

@morgante
Copy link
Contributor

morgante commented Nov 6, 2018

@pratikmallya Yes, need to merge it again soon.

CPL-markus pushed a commit to WALTER-GROUP/terraform-google-kubernetes-engine that referenced this pull request Jul 15, 2024
…modules/feature/restructure-tests

Restructure tests to take advantage of kitchen-terraform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants