Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_compute_instance: Tries to change boot disk infos every time #988

Closed
waldiTM opened this issue Jan 20, 2018 · 28 comments
Closed

google_compute_instance: Tries to change boot disk infos every time #988

waldiTM opened this issue Jan 20, 2018 · 28 comments

Comments

@waldiTM
Copy link

waldiTM commented Jan 20, 2018

Terraform Version

Terraform v0.11.2
+ provider.google v1.5.0

This is a regression from

Terraform v0.11.2
+ provider.google v1.4.0

Affected Resource(s)

  • google_compute_instance

Terraform Configuration Files

resource "google_compute_instance" "default" {
  name = "${var.prefix}"
  zone = "${var.zone}"
  machine_type = "${var.instance_type}"

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-9"
      type = "pd-ssd"
    }
  }

  network_interface {
    # TODO: workaround for https://github.com/terraform-providers/terraform-provider-google/issues/926
    subnetwork = "regions/${substr(var.zone, 0, length(var.zone) - 2)}/subnetworks/mirror"
  }

  lifecycle {
    ignore_changes = [
      "network_interface",
    ]
  }
}

Debug Output

Expected Behavior

On subsequent apply calls no changes should be done.

Actual Behavior

All parameters of the boot disk are changed in a no-op operation:

  ~ google_compute_instance.default
      boot_disk.0.device_name:               "persistent-disk-0" => <computed>
      boot_disk.0.initialize_params.0.image: "https://www.googleapis.com/compute/v1/projects/debian-cloud/global/images/debian-9-stretch-v20170816" => "debian-cloud/debian-9"
      boot_disk.0.initialize_params.0.size:  "10" => <computed>

If I ignore changes for boot_disk.0.initialize_params.0.image, it still tries to do this:

  ~ google_compute_instance.default
      boot_disk.0.device_name:              "persistent-disk-0" => <computed>
      boot_disk.0.initialize_params.0.size: "10" => <computed>

Steps to Reproduce

  1. terraform apply
@danawillow
Copy link
Contributor

Hey @waldiTM, I'm having trouble reproducing this. I tried running that config at v1.5.0 and also by running it at v1.4.0, upgrading to v1.5.0, and then running terraform plan again (in case it had something to do with the upgrade specifically). Can you comment with the full terraform plan output? This might actually be a red herring for something else.

@waldiTM
Copy link
Author

waldiTM commented Jan 23, 2018

Your are right. I did not even see that only some instances show this behaviour. You can find the plan at https://gist.github.com/waldiTM/ff91d446f9b5a11bb2a61d73b3ad21ce and the code at https://gitlab.com/waldi/debian-mirror-cloud-terraform.

@pdecat
Copy link
Contributor

pdecat commented Jan 23, 2018

FWIW, I am affected too with 1.5.0, also coming from 1.4.0, and also for some instances but not others in the same terraform configuration.

Also, the subnetwork part is affected by the same issue.

# terraform version
Terraform v0.11.2
+ provider.google v1.5.0
# terraform plan
[...]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  ~ google_compute_instance.nat
      boot_disk.0.device_name:                             "persistent-disk-0" => <computed>
      boot_disk.0.initialize_params.0.image:               "https://www.googleapis.com/compute/v1/projects/myproject/global/images/myproject-debian9-1507827129" => "myproject-debian9-1507827129"
      boot_disk.0.initialize_params.0.size:                "30" => <computed>
      network_interface.0.access_config.0.assigned_nat_ip: "104.xxx.xxx.xxx" => <computed>
      network_interface.0.address:                         "172.16.16.2" => <computed>
      network_interface.0.name:                            "nic0" => <computed>
      network_interface.0.network_ip:                      "172.16.16.2" => <computed>
      network_interface.0.subnetwork:                      "https://www.googleapis.com/compute/v1/projects/myproject/regions/europe-west1/subnetworks/myproject-europe-west1-subnet1" => "myproject-europe-west1-subnet1"
      network_interface.0.subnetwork_project:              "myproject" => <computed>

  ~ google_compute_instance.nat
      boot_disk.0.device_name:                             "persistent-disk-0" => <computed>
      boot_disk.0.initialize_params.0.image:               "https://www.googleapis.com/compute/v1/projects/myproject/global/images/myproject-debian9-1507827129" => "myproject-debian9-1507827129"
      boot_disk.0.initialize_params.0.size:                "30" => <computed>
      network_interface.0.access_config.0.assigned_nat_ip: "104.xxx.xxx.xxx" => <computed>
      network_interface.0.address:                         "172.16.16.3" => <computed>
      network_interface.0.name:                            "nic0" => <computed>
      network_interface.0.network_ip:                      "172.16.16.3" => <computed>
      network_interface.0.subnetwork:                      "https://www.googleapis.com/compute/v1/projects/myproject/regions/europe-west1/subnetworks/myproject-europe-west1-subnet1" => "myproject-europe-west1-subnet1"
      network_interface.0.subnetwork_project:              "myproject" => <computed>


Plan: 0 to add, 2 to change, 0 to destroy.

Current workaround

Use self_link instead of name for image and subnetwork for these instances.

@rosbo
Copy link
Contributor

rosbo commented Jan 23, 2018

@danawillow Here is more to help you pin point the root cause:

Probably caused by #948. However, the diff on the image and subnetwork should in theory be suppressed by their respective DiffSuppressFunc.

For instance, the diff @waldiTM is showing should be suppressed by the DiffSuppressFunc. See unit test for this particular case: https://github.com/terraform-providers/terraform-provider-google/blob/master/google/resource_compute_disk_test.go#L79

@iniinikoski
Copy link

iniinikoski commented Jan 29, 2018

Hmm, same issue here I guess with 1.5.0 vs. 1.4.0... But it says the "id" of the resource would be causing the destroy+create... Gist available here: https://gist.github.com/iniinikoski/0cbd04cee23a470d357131a1e09dca21

Locked to 1.4.0 for now...

@danawillow
Copy link
Contributor

Thanks @iniinikoski! @rosbo, looks like this breaks our assumption that the family name will always be part of the image name:

boot_disk.0.initialize_params.0.image:               "https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/ubuntu-1404-trusty-v20171208" => "ubuntu-os-cloud/ubuntu-1404-lts" �[31m(forces new resource)�[0m

That's different from the other examples above where the assumption did hold true, so I'm still curious why the DiffSuppress didn't work on those, but we may need to go with a different solution altogether.

@rosbo
Copy link
Contributor

rosbo commented Jan 29, 2018

@danawillow We need to fix our DiffSuppressFunc to handle these cases too.

I am still puzzled by the issue experienced by @pdecat and @waldiTM... It triggers an in-place update but the subnetwork and image field diff should be suppressed and these fields are ForceNew which means they would trigger a forces new resources update, not an in-place update... Another field must be triggering this update.

@iniinikoski, a workaround for the issue you are experiencing is to use the google_compute_image datasource like this:

data "google_compute_image" "ubuntu" {
  family = "ubuntu-1404-lts"
  project = "ubuntu-os-cloud"
}

resource "google_compute_instance" "my-instance" {
  // ...
  boot_disk {
    initialize_params {
      image = "${data.google_compute_image.ubuntu.self_link}"
    }
  }
  // ...
}

@waldiTM
Copy link
Author

waldiTM commented Feb 10, 2018

1.6.0 does not fix the problem.

As you mentioned that this plan should force a new resource: Is there a way to see in the debug output which resource actually triggers refreshs? I did not manage to find it.

@waldiTM
Copy link
Author

waldiTM commented Feb 10, 2018

I updated to terraform 0.11.3 and the diffs looks larger:

      boot_disk.#:                            "1" => "1"
      boot_disk.0.auto_delete:                "true" => "true"
      boot_disk.0.device_name:                "persistent-disk-0" => <computed>
      boot_disk.0.disk_encryption_key_sha256: "" => <computed>
      boot_disk.0.initialize_params.#:        "1" => "1"
      boot_disk.0.initialize_params.0.image:  "https://www.googleapis.com/compute/v1/projects/debian-cloud/global/images/debian-9-stretch-v20171025" => "debian-cloud/debian-9"
      boot_disk.0.initialize_params.0.size:   "10" => <computed>
      boot_disk.0.initialize_params.0.type:   "pd-ssd" => "pd-ssd"

@waldiTM
Copy link
Author

waldiTM commented Feb 10, 2018

But now the whole diff vanishes if I ignore boot_disk.0.initialize_params.0.image.

@waldiTM
Copy link
Author

waldiTM commented Feb 10, 2018

And after some instrumentation I found out that diskImageDiffSuppress works correctly:

[DEBUG] diskImageDiffSuppress: Call result=true old=https://www.googleapis.com/compute/v1/projects/debian-cloud/global/images/debian-9-stretch-v20180105 new=debian-cloud/debian-9

However I see a lot of calls without any old image, which according to the comment should not happen.

[DEBUG] diskImageDiffSuppress: Call result=false old= new=debian-cloud/debian-9

@waldiTM
Copy link
Author

waldiTM commented Feb 10, 2018

If I make this error condition return true, it does not happen any more. So Terraform sometimes does not provide the old image, even if the shown diff lists it.

@waldiTM
Copy link
Author

waldiTM commented Feb 10, 2018

The problem in question is triggered by a completely different part of the config. I dropped a workaround for #926 and this seems make this problem go away:

--- a/gce/archive/backend/instances.tf
+++ b/gce/archive/backend/instances.tf
@@ -29,8 +29,7 @@ resource "google_compute_instance" "default" {
   }
 
   network_interface {
-    # TODO: workaround for https://github.com/terraform-providers/terraform-provider-google/issues/926
-    subnetwork = "regions/${substr(var.zone, 0, length(var.zone) - 2)}/subnetworks/mirror"
+    subnetwork = "mirror"
   }
 
   lifecycle {

@danawillow
Copy link
Contributor

Just checking back in here to see how things are going. Has anyone run into this issue recently?

@pdecat
Copy link
Contributor

pdecat commented Mar 22, 2018

Hi @danawillow, I just tried to revert the changes I made to workaround the issue two months ago ( #988 (comment) ), i.e. using self_link instead of name for image and subnetwork for these instances and the issue is still there.

When using name, after every terraform apply execution, the plan will still want to update the compute instance boot disk (and network interface):

An execution plan has been generated and is shown below.                                                                                                                                                       
Resource actions are indicated with the following symbols:                                                                                                                                                     
  ~ update in-place                                                                                                                                                                                            
                                                                                                                                                                                                               
Terraform will perform the following actions:                                                                                                                                                                  
                                                                                                                                                                                                               
  ~ module.transport.module.nat0.google_compute_instance.nat                                                                                                                                                   
      boot_disk.#:                                         "1" => "1"                                  
      boot_disk.0.auto_delete:                             "true" => "true"                            
      boot_disk.0.device_name:                             "persistent-disk-0" => <computed>           
      boot_disk.0.disk_encryption_key_sha256:              "" => <computed>                            
      boot_disk.0.initialize_params.#:                     "1" => "1"                                  
      boot_disk.0.initialize_params.0.image:               "https://www.googleapis.com/compute/v1/projects/my-preprod/global/images/my-debian9-1507827129" => "my-debian9-1507827129"             
      boot_disk.0.initialize_params.0.size:                "30" => <computed>                          
      boot_disk.0.initialize_params.0.type:                "pd-standard" => "pd-standard"              
      network_interface.#:                                 "1" => "1"                                  
      network_interface.0.access_config.#:                 "1" => "1"                                  
      network_interface.0.access_config.0.assigned_nat_ip: "104.199.xxx.xxx" => <computed>               
      network_interface.0.access_config.0.nat_ip:          "104.199.xxx.xxx" => "104.199.xxx.xxx"          
      network_interface.0.address:                         "172.16.16.2" => <computed>                 
      network_interface.0.name:                            "nic0" => <computed>                        
      network_interface.0.network_ip:                      "172.16.16.2" => <computed>                 
      network_interface.0.subnetwork:                      "https://www.googleapis.com/compute/v1/projects/my-preprod/regions/europe-west1/subnetworks/my-preprod-europe-west1-subnet1" => "my-preprod-europe-west1-subnet1"                 
      network_interface.0.subnetwork_project:              "my-preprod" => <computed>           

  ~ module.transport.module.nat1.google_compute_instance.nat                                           
      boot_disk.#:                                         "1" => "1"                                  
      boot_disk.0.auto_delete:                             "true" => "true"                            
      boot_disk.0.device_name:                             "persistent-disk-0" => <computed>           
      boot_disk.0.disk_encryption_key_sha256:              "" => <computed>                            
      boot_disk.0.initialize_params.#:                     "1" => "1"                                  
      boot_disk.0.initialize_params.0.image:               "https://www.googleapis.com/compute/v1/projects/my-preprod/global/images/my-debian9-1507827129" => "my-debian9-1507827129"             
      boot_disk.0.initialize_params.0.size:                "30" => <computed>                          
      boot_disk.0.initialize_params.0.type:                "pd-standard" => "pd-standard"              
      network_interface.#:                                 "1" => "1"                                  
      network_interface.0.access_config.#:                 "1" => "1"                                  
      network_interface.0.access_config.0.assigned_nat_ip: "104.199.xxx.xxx" => <computed>               
      network_interface.0.access_config.0.nat_ip:          "104.199.xxx.xxx" => "104.199.60.xxx"          
      network_interface.0.address:                         "172.16.16.3" => <computed>                 
      network_interface.0.name:                            "nic0" => <computed>                        
      network_interface.0.network_ip:                      "172.16.16.3" => <computed>                 
      network_interface.0.subnetwork:                      "https://www.googleapis.com/compute/v1/projects/my-preprod/regions/europe-west1/subnetworks/my-preprod-europe-west1-subnet1" => "my-preprod-europe-west1-subnet1"                 
      network_interface.0.subnetwork_project:              "my-preprod" => <computed>           


Plan: 0 to add, 2 to change, 0 to destroy.         
# terraform version
Terraform v0.11.4
+ provider.google v1.8.0

@danawillow
Copy link
Contributor

@pdecat thanks! Do you have a configuration you'd be willing to share that can reproduce this? Ideally it would be something fairly small to make it easy to pinpoint what's going on. I'm pretty sure that this is an issue with tf core (since the issue is that it's showing a diff in the first place on something that should have been suppressed), but I'd rather not file the report there until I can say with certainty what's happening, and I still haven't been able to come up with a repro :-/

@paddycarver paddycarver removed their assignment Mar 22, 2018
@pdecat
Copy link
Contributor

pdecat commented Mar 23, 2018

@danawillow I tried to put together a reproduction test case but failed to reproduce the issue outside of my actual complete stack so far.

I quickly compared the trace level outputs of terraform plan executions of those two setups to try identifying the cause but did not find anything suspect there yet.
Did the same with dumps of the terraform states, no luck.

I think I have to resort to add some additional logging statements to that part of the code to understand what's going on...

PS: I did not find an easy way to attach a debugger to a terraform spawn plugin process, that would have made it so much easier!

Edit: turned out attaching delve to the terraform spawned plugin sub-process works fine, it's just a matter of timing. However, my last debugging session did not show any misbehaving from the google plugin diff suppress funcs...

@pdecat
Copy link
Contributor

pdecat commented May 25, 2018

Similar issue upgrading from 1.12.0 to 1.13.0, terraform apply now wants to perform this change every time:

  ~ module.transport.module.nat0.google_compute_instance.nat
      network_interface.#:                                 "1" => "1"
      network_interface.0.access_config.#:                 "1" => "1"
      network_interface.0.access_config.0.assigned_nat_ip: "104.199.xx.xx" => <computed>
      network_interface.0.access_config.0.nat_ip:          "104.199.xx.xx" => "104.199.xx.xx"
      network_interface.0.address:                         "172.16.16.2" => <computed>
      network_interface.0.name:                            "nic0" => <computed>
      network_interface.0.network_ip:                      "172.16.16.2" => <computed>
      network_interface.0.subnetwork:                      "https://www.googleapis.com/compute/beta/projects/myproject/regions/europe-west1/subnetworks/myproject-europe-west1-subnet1" => "https://www.googleapis.com/compute/v1/projects/myproject/regions/europe-west1/subnetworks/myproject-europe-west1-subnet1"
      network_interface.0.subnetwork_project:              "myproject" => <computed>

Note the beta => v1 change in the subnetwork link.

Edit: passing the subnetwork name instead of its self_link does not work around the issue.

@danawillow
Copy link
Contributor

Hmm, I'm not sure why our diffsuppress isn't catching that. Mind posting debug logs?

@pdecat
Copy link
Contributor

pdecat commented May 25, 2018

Sure, let me redact some parts of it.

Edit: sent you an email with the logs.

@danawillow
Copy link
Contributor

Thanks- I took a look and didn't see anything weird. I'm sorry that we haven't gotten to the bottom of this yet. It's tough since it seems to only affect a few people, even though I know that a lot of people are using the google_compute_instance resource. In the meantime, you might be able to get around it by manually editing your state file (I know this is a terrible workaround, but I'm pretty much out of ideas)

@pdecat
Copy link
Contributor

pdecat commented May 30, 2018

Last time I tried, editing the state by hand did not help.

A similar issue was just opened where some apparently unrelated field triggers a diff on network information where used: #1566

@pdecat
Copy link
Contributor

pdecat commented Jun 7, 2018

Quick update : updating the state is confirmed not to work.

@danawillow
Copy link
Contributor

Bug filed against TF: hashicorp/terraform#18209

@pdecat
Copy link
Contributor

pdecat commented Jun 29, 2018

This issue is resolved for me with v1.15.0.

Edit: what actually resolved the issue and let me revert the workarounds put in place in January is that I removed the ignore_changes life cycle rule from the affected resource a few days before:

resource "google_compute_instance" "nat" {
  name        = "mynat"

  lifecycle {
    ignore_changes = [
      "metadata_startup_script",
    ]
  }

...
}

PS: I realize I never shared my configuration source, sorry about that.

@danawillow
Copy link
Contributor

Closing since upstream bug is fixed in HEAD.

@OGProgrammer
Copy link

For now, getting around this with using the google_compute_image data source but in my terraform module for the instance, I'm ignoring any of the init params on the boot disks to avoid destroying them when a new image comes out.

resource "google_compute_instance" "main" {

// normal stuff

lifecycle {
    ignore_changes = [
      "metadata_startup_script",
      "boot_disk.0.initialize_params"
    ]
  }
}

@ghost
Copy link

ghost commented Jan 12, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Jan 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants