Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth" #143

Closed
nitrocode opened this issue Jan 27, 2022 · 7 comments · Fixed by #150
Closed

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth" #143

nitrocode opened this issue Jan 27, 2022 · 7 comments · Fixed by #150
Labels
bug 🐛 An issue with the system

Comments

@nitrocode
Copy link
Member

nitrocode commented Jan 27, 2022

Found a bug? Maybe our Slack Community can help.

Slack Community

Describe the Bug

I noticed that if the eks cluster is switching subnets, particularly public + private, to only private, the eks cluster will return an endpoint of localhost (for aws_eks_cluster.default[0].endpoint).

╷
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
│
│   with module.eks_cluster.kubernetes_config_map.aws_auth[0],
│   on .terraform-mdev/modules/eks_cluster/auth.tf line 135, in resource "kubernetes_config_map" "aws_auth":135: resource "kubernetes_config_map" "aws_auth" {

Related code

provider "kubernetes" {
# Without a dummy API server configured, the provider will throw an error and prevent a "plan" from succeeding
# in situations where Terraform does not provide it with the cluster endpoint before triggering an API call.
# Since those situations are limited to ones where we do not care about the failure, such as fetching the
# ConfigMap before the cluster has been created or in preparation for deleting it, and the worst that will
# happen is that the aws-auth ConfigMap will be unnecessarily updated, it is just better to ignore the error
# so we can proceed with the task of creating or destroying the cluster.
#
# If this solution bothers you, you can disable it by setting var.dummy_kubeapi_server = null
host = local.enabled ? coalesce(aws_eks_cluster.default[0].endpoint, var.dummy_kubeapi_server) : var.dummy_kubeapi_server

Workaround 1

To get around this issue, I have to delete the kubeconfig map resource and then the module can be tricked to redeploying the eks cluster (due to the change in subnets).

cd components/terraform/eks/eks

terraform state rm 'module.eks_cluster.kubernetes_config_map.aws_auth[0]'

or in atmos

atmos terraform state eks/eks --stack dev-use2-qa rm 'module.eks_cluster.kubernetes_config_map.aws_auth[0]'

if this workaround was done by mistake, you can re-import the deleted config

atmos terraform import eks/eks --stack dev-use2-qa 'module.eks_cluster.kubernetes_config_map.aws_auth[0]' kube-system/aws-auth

Workaround 2

I've also ran into this issue when importing an existing cluster into the terraform module. My workaround for the import is to do a terraform init and modify the downloaded module eks_cluster's auth.tf to set the host arg of the kubernetes provider to the dummy url.

vim .terraform/modules/eks_cluster/auth.tf
--- a/auth.tf
+++ b/auth.tf
@@ -94,7 +94,7 @@ provider "kubernetes" {
   # so we can proceed with the task of creating or destroying the cluster.
   #
   # If this solution bothers you, you can disable it by setting var.dummy_kubeapi_server = null
-  host                   = local.enabled ? coalesce(aws_eks_cluster.default[0].endpoint, var.dummy_kubeapi_server) : var.dummy_kubeapi_server
+  host                   = var.dummy_kubeapi_server

Proposal

Instead, it would be nice if we could either detect that the endpoint returns localhost and use something else that won't fail the kubeconfig, or disable the kubernetes provider completely when the endpoint is localhost.

@nitrocode nitrocode added the bug 🐛 An issue with the system label Jan 27, 2022
@korenyoni korenyoni changed the title The localhost url issue outputs.eks_cluster_endpoint returning localhost in some cases Jan 27, 2022
@nitrocode nitrocode changed the title outputs.eks_cluster_endpoint returning localhost in some cases Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth" Jan 27, 2022
@Nuru
Copy link
Contributor

Nuru commented Feb 22, 2022

@nitrocode Please look into why aws_eks_cluster.default[0].endpoint is returning localhost. That seems like a bug. The cluster endpoint should be a Kubernetes master node, which we should never be running on, so it should never be localhost, right?

@snooyen
Copy link

snooyen commented Feb 24, 2022

This is causing issues when attempting to delete EKS clusters as well.

$ atmos terraform destroy eks -s tenant-uw2-dev
.
.
Executing command:
/usr/bin/terraform destroy -var-file tenant-uw2-dev-eks.terraform.tfvars.json
.
.
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
│
│   with module.eks_cluster.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks_cluster/auth.tf line 132, in resource "kubernetes_config_map" "aws_auth":
│  132: resource "kubernetes_config_map" "aws_auth" {
│
╵
Releasing state lock. This may take a few moments...
exit status 1

@michaelkoro
Copy link

This happens to me as well when running module version 0.45.0, but without any subnet changes.
I suspect maybe the config map is refreshed before the eks cluster when running the plan/apply command,
because in our state file, aws_eks_cluster.default[0].endpoint shows the real kubernetes endpoint and not localhost.

@vsimon
Copy link

vsimon commented Apr 3, 2022

I also have module version 0.44.0 currently and I also am getting something similar when updating the module version then planning module version 0.45.0.

terraform plan

│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused
│ 
│   with module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0],
│   on .terraform/modules/eks_cluster/auth.tf line 115, in resource "kubernetes_config_map" "aws_auth_ignore_changes":
│  115: resource "kubernetes_config_map" "aws_auth_ignore_changes" {

@michaelkoro
Copy link

michaelkoro commented Apr 4, 2022

I managed to kinda bypass the issue.
I removed the config map from the state, and told the module to not create the config map resource,
I Then created an external config map resource with my own provider configuration (imported it back into the state) -

provider "kubernetes" {
  token                  = data.aws_eks_cluster_auth.eks.token
  host                   = module.eks_cluster.eks_cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks_cluster.eks_cluster_certificate_authority_data)
}

resource "kubernetes_config_map" "aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }
  data = {
    mapRoles = replace(yamlencode(distinct(var.map_additional_iam_roles)), "\"", "")
  }
  depends_on = [module.eks_cluster]
  lifecycle {
    ignore_changes = [data["mapRoles"]]
  }
}

This way I have more control over the provider version, which I suspect is causing issues.
I remember when the kubernetes provider version 2 first came out, this error occurred quite a lot.
At least for now, this workaround managed to help.

@Nuru
Copy link
Contributor

Nuru commented May 17, 2022

There are various problems caused by the fact that we are calling the API of a resource that is being created or deleted at the same time. The official recommendation from Hashicorp is to break this module up into multiple modules, one to create the EKS cluster, one to create the aws-auth ConfigMap, and one to attach the worker nodes to the cluster. You can effectively do that by doing what @michaelkoro did minus the import back into this module.

Recommended workarounds

This module provides 3 different authentication mechanisms to help work around the issues. We generally recommend using kube_exec_auth_enabled when possible. When deleting the cluster, you can use kubeconfig_path and kubeconfig_path_enabled to provide a dummy configuration if needed.
See version 0.42.0 Release Notes for details.

@nitrocode Providing a KUBECONFIG via kubeconfig_path is documented as being required for importing resources. This is due to a limitation of how Terraform initializes providers when doing imports.

@Nuru Nuru mentioned this issue May 18, 2022
@Nuru
Copy link
Contributor

Nuru commented May 18, 2022

Duplicate of #104

@Nuru Nuru marked this as a duplicate of #104 May 18, 2022
@Nuru Nuru pinned this issue May 18, 2022
@Nuru Nuru closed this as completed in #150 May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 An issue with the system
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants