-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on-destroy provisioners not being executed #13549
Comments
Hi @IOAyman! Thanks for opening this. This is indeed related to both #13097 and #13395, and seems to be another example of the same general problem. However, each of these would be solved in a different part of the codebase so I'm going to leave all three open with these connectors between them though ultimately we will probably attempt to fix them all with one PR in the end. |
I'm having this issue in a local-exec provisioner when I am tainting aws_instance resources, but I can't tell from this issue text if I'm hitting this particular bug. Are the same mechanism used to |
@Gary-Armstrong I'm almost certain that you've found another variant of the same root cause there. Thanks for mentioning it! Having slept on it a bit I've changed my mind and am going to fold all of these issues into this one as an umbrella issue for addressing the various places that destroy provisioners aren't working yet, since I strongly suspect that the fix will be a holistic one that covers all of these cases together. Note for me or someone else who picks this up in future: the root challenge here is that the destroy provisioners currently live only in config, but yet most of our destroy use-cases don't have access to config for one reason or another. I think the holistic fix here is to change how destroy provisioners work so that during apply we include the fully-resolved destroy provisioner configs as part of the instance state, and then when we destroy those instances we use the stashed config in the state to destroy them, thus avoiding the direct dependency on config during destroy. This can then address all of the variants of the problem we've seen so far, and probably ones we've not yet seen:
|
More info: I eliminated some instances by setting count = 0 and the local-exec ran properly. |
I'd like to add some color here: when = "destroy" provisioners are also not being run for aws_instances that are marked as tainted, which are then destroyed and re-created. This is on v0.9.3. I see that this was #3 in the above comment, sorry for adding more traffic. A word on the mind of the maintainer, here, though.. I consider it my responsibility to design idempotency around the create-destroy provisioners. I expect TF to be a trigger, based on when specific events start, not at some point during their execution. I can design my provisioners to error when I want them to, and ignore errors that I don't consider constructive; thereby avoiding the whole discussion of "when should we run these?" Maybe the destroyers should have an option or two allowing us to customize when/where they fire? Food for thought. My current destruction provisioner (This might be helpful to see my position):
An alternative would be a way to gate the overall instance termination on the relative success or failure of other designated dependent destructors prior to outright instance destruction. I wonder if that can be accomplished by putting the local-exec destructor before the Chef provisioner, but I haven't tested to see if that would work or not. Then I could save the desync by designing my destructor to stop terraform destroy in a better way. |
In the case of a tainted resource, tracking the individual run results and configurations of provisioners against the resource as part of resource state should help Terraform in deciding whether a provisioner needs to run its destroy action on the resource, with additional guidance from the resource's configuration. For provisioners that kick off config management tools, a successful run usually indicates there's something there that needs to be torn down at destroy time. There will probably be a common set of actions that each CM tool uses for decommissioning, as in @in4mer's example where API calls need to get made to remove the instance from a Chef server. (I actually found this thread because I thought that's how Terraform's Chef provisioner already worked!) Remote or local exec are more open-ended, so they might need to have destroy-stage actions explicitly defined in the resource config, defaulting to noop. |
Edit: Sorry, it appears to run correctly now. Please ignore this whole comment I have the same issue with a |
I'm bumping this because it's still an issue. Honestly, IDGAF about we're waiting for eleventeen planets to align so we can have a celestially perfect tea ceremony to please our ancestors in the afterlife but we'll have to wait six more years for it to happen. This issue needs a fix; gate the destroy provisioner blocks off the resource they're attached to. If the resource was there, run the provisioners and let the admins sort it out. |
I have seen a problem related to this but I think I know what is happening. I am destroying an instance and inside it I have a provisioner set with
I dont think this should be expected behaviour. Instead destroy provisioner should be queued before any other changes |
This issue bugs us with our Terraform managed DC/OS cluster. Since we can't run destroy provisioners (for killing dcos-mesos-slave service), the jobs on our destroyed nodes are not moved to other nodes timely... @bizmate's comment is interesting. There could be an easy fix there for some (maybe the majority) use cases. |
I see the same issue as bizmate. In my case I'm trying to run a destroy provisioner on ebs volume attachments, but it seems like we lose the network routes before the provisioner has run. My case is slightly different as I'm going through a bastion and the code below is in a module (this could be the edge case).
|
I found a workaround with a resource "aws_instance" "my_instance" {
...
}
resource "null_resource" "my_instance_provisioning" {
triggers {
uuid = "${aws_instance.my_instance.id}"
}
provisioner "remote-exec" {
inline = [
"bash setup.sh",
]
}
provisioner "remote-exec" {
when = "destroy"
inline = [
"bash teardown.sh",
]
} The clue is that the |
I am getting error: |
Here is my code .... resource "aws_instance" "ec2_instance" {
ami = "${var.AMI_ID}"
instance_type = "${var.ec2_type}"
key_name = "${var.ec2_keyname}"
vpc_security_group_ids = ["${var.ec2_security_group_id}"]
subnet_id = "${var.ec2_subnet_id}"
iam_instance_profile = "${var.ec2_role_name}"
........
resource "null_resource" "my_instance_provisioning" {
triggers {
uuid = "${aws_instance.ec2_instance.id}"
}
#provisioner "local-exec" {
# command = "sleep 30"
# when = "destroy"
#}
provisioner "file" {
source = "scripts/teardown.sh"
destination = "/tmp/teardown.sh"
connection {
type = "ssh"
user = "${var.ec2_runuser}"
}
}
provisioner "remote-exec" {
inline = [
"sudo chmod +x /tmp/teardown.sh",
"sudo /tmp/teardown.sh",
]
when = "destroy"
connection {
type = "ssh"
user = "${var.ec2_runuser}"
}
}
} |
hi @matikumra69: could you put your code in an environment, then it's better readable and I can help you with that :) Use the Update: @matikumra69 can you provide the whole code for |
Hi @mavogel, I am getting this error.
This is what I am doing ..... resource "null_resource" "my_instance_provisioning" { |
hi @matikumra69, first: please use code block: A guide is here. Then your code is better readable. Btw: a resource "null_resource" "my_instance_provisioning" {
triggers {
uuid = "${aws_instance.ec2_instance.private_ip}"
}
provisioner "remote-exec" {
inline = [
"sudo touch /tmp/abc",
]
connection {
type = "ssh"
user = "${var.ec2_runuser}"
}
} It would also help if you could provide your whole code with all definitions as a gist and link it here. With such small code snippets it's hard to solve your issue |
@apparentlymart Just touching on your comment regarding 'tainted' instance flow, would supporting that just be a case of removing this As not running the |
Any update on this one? This is causing us issues as we'd love to be able to use a destroy provisioner to do some cleanup on aws instances before they are destroyed. It's currently not working for us when using |
On terraform v0.11.7, with create_before_destroy=true and tainting the resource, the destruction time provisioners - local and remote exec - are not being ran on that resource. However If I ran a destroy of the entire infrastructure the destroy provisioners are ran. |
It seems to me that just preventing it from running on tainted resources is too heavy-handed. I understand your reasons @apparentlymart, but perhaps it would be better to leave handling it to the user code. Once I have the provisioner running I can do all kinds of checks and conditional execution. I'd much rather have to hack something like that then just not being able to hook into destruction cycle at all. So maybe you can at least fix that. If you are worried about backward compatibility, then maybe there should be an option, like Also current behavior should be clearly documented. It isn't obvious and there is nothing about it here: https://www.terraform.io/docs/provisioners/index.html#destroy-time-provisioners |
Guys, is there any updates? Can we expect to some additional values for when command.
In additional this can be combined in lists for best effort, e.g.: |
@kidmis I wouldn't expect for this to happen anytime soon, as was told by teamterraform in 2019:
I suggest you use |
I want to add another use case. Creating instances with the VMware provisioner inside a module. The script does not get executed when the parent module is removed from the workspace.
Running this with anything other than terraform apply is not really an option, since this is meant into a VCS managed Terraform Enterprise workspace. |
Any news hiere? |
It is difficult to understand the reasoning by the terraform team here. This ticket has been open for more than 4 years, and I believe anyone having this need, do not understand why |
It would be great if something could be done about this. |
@arbourd, Sorry I'm not sure what you mean. This issue is tracking the cases where destroy provisioners can't currently be executed, with deposed resources from |
Right, but this issue is very old and has a bunch of complaints about it not working on "apply" at all. Are the two major issues right now:
Is that correct? |
@arbourd, unfortunately any longstanding issues tend to accumulate large amounts of unrelated or unnecessary comments. Destroy provisioners were implemented with some fundamental shortcomings which are difficult to incorporate into the normal resource lifecycle. The summary above is still representative of the current situation. |
For those of us using
|
Can you at least inherit from or clone null_resource into new kind of resource (custom_resource is taken, so maybe undetermined_resource) and implement proper behavior? update/change handler/event when resource is being replaced, actually run destroy when destroyed, not only when Now some more spam with confirmations: Apply does not trigger destroy but it's saying it does.
And when I did use
Also apply causing replace does trigger destroy as well. It would be better to have update/change handler instead. Provisioner definition:
Right now to bypass this problem I'm creating purging null_resource which has to be removed from the code after some time. It's ultra uncomfortable and unexpected if someone is using my modules (I have now two of those) for the first time. Instead, resource could be removed manually if someone has permissions of course, but then, it's not following IaC. |
I am surprised this issue is still open after all this time. I have the simplest case: I want to recreate my k8s control planes and for obvious reasons, I first need new servers to be created before I can destroy old servers. Here's an example to illustrate: locals {
hash = "change this to whatever you want and reapply"
}
resource "random_string" "node" {
length = 3
lower = true
special = false
numeric = false
upper = false
keepers = {
user_data = sha256(local.hash)
}
}
resource "null_resource" "create" {
triggers = {
hash = local.hash
node = random_string.node.id
}
provisioner "local-exec" {
when = create
command = "echo create ${self.triggers.hash} ${self.triggers.node}"
}
lifecycle {
create_before_destroy = true
}
depends_on = [
random_string.node,
]
}
resource "null_resource" "destroy" {
triggers = {
hash = local.hash
node = random_string.node.id
}
provisioner "local-exec" {
when = destroy
command = "echo destroy ${self.triggers.hash} ${self.triggers.node}"
}
lifecycle {
# comment line below to see the difference
create_before_destroy = true
}
depends_on = [
random_string.node,
null_resource.create,
]
} And the output: random_string.node: Creating...
random_string.node: Creation complete after 0s [id=kjc]
null_resource.create: Creating...
null_resource.create: Provisioning with 'local-exec'...
null_resource.create (local-exec): Executing: ["/bin/sh" "-c" "echo create change this to whatever you want and reapply kjc"]
null_resource.create (local-exec): create change this to whatever you want and reapply kjc
null_resource.create: Creation complete after 0s [id=3429720091763885346]
null_resource.destroy: Creating...
null_resource.destroy: Creation complete after 0s [id=948398361774632729]
null_resource.destroy (deposed object d889f24e): Destroying... [id=5541652494173857564]
null_resource.destroy: Destruction complete after 0s
null_resource.create (deposed object ad4d07d0): Destroying... [id=169389285284865921]
null_resource.create: Destruction complete after 0s
random_string.node (deposed object 892cf9f8): Destroying... [id=jtq]
random_string.node: Destruction complete after 0s The server ( On the other hand, if I comment out
Too bad I've lost all my data. How is that not a valid use case? |
We need this to safely update instance types for EKS managed node groups, which cannot be done in-place and forces re-creation. |
Is there any update here? |
What if instead of supporting only |
@giner, yes a separate managed resource is often the preferred solution here. Either a custom one which suits your use case, or something more generic like “external” which runs different commands at various points in the resource instance lifecycle. I think there are some existing examples in the public registry. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
v0.9.2
Affected Resource(s)
Terraform Configuration Files
Debug Output
https://gist.github.com/IOAyman/3e86d9c06d03640786184c1429376328
Expected Behavior
It should have run the on-destroy provisioners
Actual Behavior
It did not run the on-destroy provisioners
Steps to Reproduce
terraform apply -var-file=infrasecrets.tfvars
terraform destroy -var-file=infrasecrets.tfvars
References
Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:
The text was updated successfully, but these errors were encountered: