Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: send provider enum values for insecureKubeletReadonlyPortEnabled #2145

Merged

Conversation

wyardley
Copy link
Contributor

@wyardley wyardley commented Oct 17, 2024

Add insecureKubeletReadonlyPortEnabled to node_config.kubelet_config for the default node-pool and for additional pools. It may also be necessary to define the top level node_config more broadly for the case where remove_default_node_pool is set to false, which should probably be handled separately.

Also, the upstream provider (intentionally) uses an enum of "TRUE" / "FALSE" vs. a boolean. Update the code to follow this, and add a test case (also suggested by @apeabody) that covers the cluster level setting vs node pool one.

Fixes #2013

@apeabody
Copy link
Collaborator

/gcbrun

@apeabody apeabody self-assigned this Oct 17, 2024
@apeabody
Copy link
Collaborator

Hmm, looks like we are getting lower-case in module.example.module.gke.google_container_node_pool.pools["pool-03"]:

Step #70 - "converge node-pool-local":           + kubelet_config {
Step #70 - "converge node-pool-local":               + cpu_cfs_quota                          = true
Step #70 - "converge node-pool-local":               + cpu_manager_policy                     = "static"
Step #70 - "converge node-pool-local":               + insecure_kubelet_readonly_port_enabled = "false"
Step #70 - "converge node-pool-local":               + pod_pids_limit                         = 4096
Step #70 - "converge node-pool-local":             }
tep #70 - "converge node-pool-local":        Error: expected node_config.0.kubelet_config.0.insecure_kubelet_readonly_port_enabled to be one of ["FALSE" "TRUE"], got false
Step #70 - "converge node-pool-local":        
Step #70 - "converge node-pool-local":          with module.example.module.gke.google_container_node_pool.pools["pool-03"],
Step #70 - "converge node-pool-local":          on ../../../modules/beta-public-cluster/cluster.tf line 656, in resource "google_container_node_pool" "pools":
Step #70 - "converge node-pool-local":         656:   node_config {

@wyardley
Copy link
Contributor Author

Hmm, looks like we are getting lower-case in

I'll take a look. I thought I checked for any spots I'd missed, but apparently not.

@apeabody
Copy link
Collaborator

Thanks @wyardley!

Probably needs a make build

@wyardley
Copy link
Contributor Author

wyardley commented Oct 17, 2024

Probably needs a make build

Got to take a look. Doesn't look like it will work exactly as-is -- will make the adjustment that I think it needs and will build and push the results.

edit: See if what's there now looks good to you

@apeabody
Copy link
Collaborator

/gcbrun

@apeabody
Copy link
Collaborator

Probably needs a make build

Got to take a look. Doesn't look like it will work exactly as-is -- will make the adjustment that I think it needs and will build and push the results.

edit: See if what's there now looks good to you

Thanks @wyardley - Yes, this looks good, and less complicated that what I suggested.

@apeabody
Copy link
Collaborator

I was expecting we might need to update the test asset, however this is unexpected.

The node_pool_defaults plan now looks good:

TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:       + node_pool_defaults {
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:           + node_config_defaults {
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:               + insecure_kubelet_readonly_port_enabled = "FALSE"
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:               + logging_variant                        = (known after apply)
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185: 
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:               + gcfs_config (known after apply)
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:             }
TestPrivateZonalWithNetworking 2024-10-17T20:54:28Z command.go:185:         }

However the default-node-pool is showing true?

    private_zonal_with_networking_test.go:81: 
        	Error Trace:	/workspace/test/integration/private_zonal_with_networking/private_zonal_with_networking_test.go:81
        	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/tft/terraform.go:638
        	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/tft/terraform.go:670
        	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/utils/stages.go:31
        	            				/builder/home/go/pkg/mod/github.com/!google!cloud!platform/cloud-foundation-toolkit/infra/[email protected]/pkg/tft/terraform.go:670
        	Error:      	Not equal: 
        	            	expected: map[string]interface {}{"diskSizeGb":100, "diskType":"pd-standard", "effectiveCgroupMode":"EFFECTIVE_CGROUP_MODE_V2", "imageType":"COS_CONTAINERD", "labels":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "node_pool":"default-node-pool"}, "loggingConfig":map[string]interface {}{"variantConfig":map[string]interface {}{"variant":"DEFAULT"}}, "machineType":"e2-medium", "metadata":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "disable-legacy-endpoints":"true", "node_pool":"default-node-pool"}, "oauthScopes":[]interface {}{"https://www.googleapis.com/auth/cloud-platform"}, "serviceAccount":"SERVICE_ACCOUNT", "shieldedInstanceConfig":map[string]interface {}{"enableIntegrityMonitoring":true}, "tags":[]interface {}{"gke-CLUSTER_NAME", "gke-CLUSTER_NAME-default-node-pool"}, "windowsNodeConfig":map[string]interface {}{}, "workloadMetadataConfig":map[string]interface {}{"mode":"GKE_METADATA"}}
        	            	actual  : map[string]interface {}{"diskSizeGb":100, "diskType":"pd-standard", "effectiveCgroupMode":"EFFECTIVE_CGROUP_MODE_V2", "imageType":"COS_CONTAINERD", "kubeletConfig":map[string]interface {}{"insecureKubeletReadonlyPortEnabled":true}, "labels":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "node_pool":"default-node-pool"}, "loggingConfig":map[string]interface {}{"variantConfig":map[string]interface {}{"variant":"DEFAULT"}}, "machineType":"e2-medium", "metadata":map[string]interface {}{"cluster_name":"CLUSTER_NAME", "disable-legacy-endpoints":"true", "node_pool":"default-node-pool"}, "oauthScopes":[]interface {}{"https://www.googleapis.com/auth/cloud-platform"}, "serviceAccount":"SERVICE_ACCOUNT", "shieldedInstanceConfig":map[string]interface {}{"enableIntegrityMonitoring":true}, "tags":[]interface {}{"gke-CLUSTER_NAME", "gke-CLUSTER_NAME-default-node-pool"}, "windowsNodeConfig":map[string]interface {}{}, "workloadMetadataConfig":map[string]interface {}{"mode":"GKE_METADATA"}}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-(map[string]interface {}) (len=14) {
        	            	+(map[string]interface {}) (len=15) {
        	            	  (string) (len=10) "diskSizeGb": (float64) 100,
        	            	@@ -5,2 +5,5 @@
        	            	  (string) (len=9) "imageType": (string) (len=14) "COS_CONTAINERD",
        	            	+ (string) (len=13) "kubeletConfig": (map[string]interface {}) (len=1) {
        	            	+  (string) (len=34) "insecureKubeletReadonlyPortEnabled": (bool) true
        	            	+ },
        	            	  (string) (len=6) "labels": (map[string]interface {}) (len=2) {
        	Test:       	TestPrivateZonalWithNetworking

@wyardley
Copy link
Contributor Author

wyardley commented Oct 17, 2024

It’s because at API level, it’s a bool, and this is the API response vs the terraform state

so I think this is expected, if unintuitive. I can update that value shortly.

@apeabody
Copy link
Collaborator

It’s because at API level, it’s a bool, and this is the API response vs the terraform state

so I think this expected, if unintuitive. I can update that value shortly.

Hey @wyardley! I'm not thinking the string vs bool types, but we set the cluster default to false, and the default node pool is showing true? Perhaps there is something special about the default node pool.

@wyardley
Copy link
Contributor Author

wyardley commented Oct 17, 2024

Let me look. If we only set node_pool_defaults we’ll probably need to set it in node_config also for the default pool I think

@apeabody
Copy link
Collaborator

Let me look. If we only set node_pool_defaults we’ll probably need to set it in node_config also for the default pool I think

Yeah, that is my guess as well.

@wyardley
Copy link
Contributor Author

So a couple questions:

  1. For the default pool, should we allow setting this explicitly in var.node_pools[0] as well as taking the default cluster setting if not?
  2. In the case where remove_default_node_pool is set to false, do we currently have google_container_cluster.node_config.kubelet_config defined in the config at all? For sure we'll need to add it here, I think, but if we need to support the case where we have remove_default_node_pool set to false, we may also need to be able to define it in the top level node_config? I may be missing something, but couldn't find that the module is using anything in that section now?

Also, should I try to squeeze this in the same PR?

@apeabody
Copy link
Collaborator

Thanks for digging into this @wyardley!

So a couple questions:

  1. For the default pool, should we allow setting this explicitly in var.node_pools[0] as well as taking the default cluster setting if not?

I think that would be the ideal situation for the default node pool - do an explicit lookup, failing that use the cluster default, failing that then set null

  1. In the case where remove_default_node_pool is set to false, do we currently have google_container_cluster.node_config.kubelet_config defined in the config at all? For sure we'll need to add it here, I think, but if we need to support the case where we have remove_default_node_pool set to false, we may also need to be able to define it in the top level node_config? I may be missing something, but couldn't find that the module is using anything in that section now?

Yeah, the best practice recommendation is to remove the default node pool, but it's not required. Due it's typical removal, I suspect it's been an accidental oversight theses weren't added, and so should be updated to match.

Also, should I try to squeeze this in the same PR?

I'm comfortable if everything is in a single PR, however I can certainly open a seperate PR to add kubelet_config to the default node pool to keep the PRs more focused. Let me know your preference.

@wyardley
Copy link
Contributor Author

I pushed up an update of what I have so far (only updated one of the tests).

I'm going to do what I probably should have done sooner, and actually run this locally, so I can try a few things a bit faster.

@apeabody
Copy link
Collaborator

/gcbrun

@wyardley
Copy link
Contributor Author

Fixing one minor issue with that, and will push again.

however I can certainly open a seperate PR to add kubelet_config to the default node pool to keep the PRs more focused

Yes, I think that would be a good idea, especially since it may need to include some additional items, and you probably have a better idea of where exactly it should go.

@wyardley wyardley force-pushed the wyardley/issues/2013_part_2 branch from 51e02c2 to bd221b2 Compare October 17, 2024 22:21
@wyardley wyardley changed the title fix: use correct enum value for insecureKubeletReadonlyPortEnabled fix: add insecureKubeletReadonlyPortEnabled to node_config Oct 17, 2024
@apeabody
Copy link
Collaborator

/gcbrun

@wyardley
Copy link
Contributor Author

wyardley commented Oct 17, 2024

The argument "cpu_manager_policy" is required, but no definition was found.

@apeabody I can add a dumb fix for this in the short term, but this can probably be removed when support is dropped for TPG 5.x.~

@apeabody
Copy link
Collaborator

apeabody commented Oct 17, 2024

Thanks @wyardley - I have #2147 to add the kubelet_config to the default-node-pool. That should resolve the missing cpu_manager_policy.

@wyardley
Copy link
Contributor Author

I have a PR to add the kubelet_config to the default-node-pool. That should resolve the missing cpu_manager_policy.

Yeah, actually, that's probably the needed / correct fix here. Do you want to send that through first and I'll rebase off of it?

@apeabody
Copy link
Collaborator

apeabody commented Oct 17, 2024

I have a PR to add the kubelet_config to the default-node-pool. That should resolve the missing cpu_manager_policy.

Yeah, actually, that's probably the needed / correct fix here. Do you want to send that through first and I'll rebase off of it?

Yes, that is the plan, hopefully by tomorrow get it merged in.

@wyardley wyardley changed the title fix: add insecureKubeletReadonlyPortEnabled to node_config fix: add insecureKubeletReadonlyPortEnabled to kubelet_config Oct 18, 2024
@apeabody
Copy link
Collaborator

I have a PR to add the kubelet_config to the default-node-pool. That should resolve the missing cpu_manager_policy.

Yeah, actually, that's probably the needed / correct fix here. Do you want to send that through first and I'll rebase off of it?

Yes, that is the plan, hopefully by tomorrow get it merged in.

Hi @wyardley - #2147 is merged, can you rebase this as needed? Thanks!

Add `insecureKubeletReadonlyPortEnabled` to `node_config.kubelet_config`
for the default node-pool and for additional pools. It may also be
necessary to define the top level `node_config` more broadly for the
case where `remove_default_node_pool` is set to false, which should
probably be handled separately.

Also, the upstream provider (intentionally) uses an enum of `"TRUE"` /
`"FALSE"` vs. a boolean. Update the code to follow this, and add a test
case that covers the cluster level setting vs node pool one.

Fixes terraform-google-modules#2013

Co-authored-by: Andrew Peabody <[email protected]>
@wyardley wyardley force-pushed the wyardley/issues/2013_part_2 branch from bd221b2 to a88d1ae Compare October 21, 2024 21:40
@wyardley wyardley changed the title fix: add insecureKubeletReadonlyPortEnabled to kubelet_config fix: send provider enum values for insecureKubeletReadonlyPortEnabled Oct 21, 2024
@wyardley
Copy link
Contributor Author

wyardley commented Oct 21, 2024

Hi @wyardley - #2147 is merged, can you rebase this as needed? Thanks!

Yup, updated now.
Let me know if the logic looks correct, and if the adjusted title makes sense (I think you've already added it to node_config in #2147?

Guessing we might still need to adjust one or more of the tests?

@apeabody
Copy link
Collaborator

/gcbrun

@apeabody
Copy link
Collaborator

FYI: Looks like the Windows node pool test has recently started timing out (tests are failing around 59-61 minutes), may need to adjust that test.

Copy link
Collaborator

@apeabody apeabody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @wyardley!

@apeabody apeabody merged commit 922ab1d into terraform-google-modules:master Oct 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

insecureKubeletReadonlyPortEnabled support
2 participants