-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: Don't try to register separate checkable resources for each module instance #31860
Conversation
We were previously _trying_ to handle diagnostics here but were not quite doing it right because we were testing whether the resulting error was nil rather than appending it to the diagnostics and then seeing if the result has errors. The difference here is important because it allows DynamicExpand to return warnings without associated errors when needed. Previously the graph walker would treat a warnings-only result as if it were an error. Ideally we'd change DynamicExpand to return diagnostics directly, but we previously decided against that because there were so many implementors to update, and my intent for this change is to be surgical in the update so we minimize risk of backporting the change into patch releases.
We previously did two levels of DynamicExpand to go from ConfigResource to AbsResource and then from AbsResource to AbsResourceInstance. We'll now do the full expansion from ConfigResource to AbsResourceInstance in a single DynamicExpand step inside nodeExpandPlannableResource. The new approach is essentially functionally equivalent to the old except that it fixes a bug in the previous implementation: we will now call checkState.ReportCheckableObjects only once for the entire set of instances for a particular resource, which is what the checkable objects infrastructure expects so that it can always mention all of the checkable objects in the check report even if we bail out partway through due to a downstream error. This is essentially the same code but now turned into additional methods on nodeExpandPlannableResource instead of having the extra graph node type. This has the further advantage of this now being straight-through code with standard control flow, instead of the unusual inversion of control we were doing before bouncing in and out of different Execute and DynamicExpand implementations to get this done.
b78b246
to
5036853
Compare
|
||
func (n *NodePlannableResource) Name() string { | ||
return n.Addr.String() | ||
return &g, diags.ErrWithWarnings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the process here combines multiple distinct graphs, the graph here often won't have a single root. The single root probably doesn't matter in practice, that is vestigial from very early Terraform, but it does show that we're not actually validating the graph return from DynamicExpand
. These have always been very simple and had no way to introduce cycles, but maybe since you're already touching the DynamicExpand call site it might be worth mentioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm indeed... I typically think of these DynamicExpand
graphs as just an over-engineered way to iterate over the multiple expanded items, but you're right that the surrounding infrastructure at least thinks it's dealing with single-rooted acyclic graphs even though these graphs never contain any real dependency edges in practice.
I'm going to first see if there's a straightforward way to add an additional root node just to maintain the contract, but if that isn't easy and low-risk then I'll find a suitable place to add a comment describing this hazard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these have definitely been reduced to an obfuscated iteration, and in other cases we've removed the graph builder entirely. It does still have to get fed into the graph walker for concurrent operations, but maybe in the future we get rid of these last few specialized DynamicExpand graph transformers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While investigating this I noticed two interesting things:
- Because the test case I added here has only one instance of one resource in the module, the root transformer didn't add an extra root node anyway because that single resource instance was already the root.
- After adding another resource to force the addition of a root, I noticed that we do actually still end up with only one root because all of the root nodes are equal to one another as currently implemented: the node type is an empty struct and we use a value of that type (rather than a pointer) as the node. Therefore it naturally coalesces together into a single node anyway.
With those two observations in mind, I'm going to add a comment to the RootTransformer that records that all root node values being equal to each other is part of its contract and leave it at that for now. If we want to do something more involved later on then of course we can, but the current code is generating a valid graph already and I don't want to add more complexity to it just for a bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple levels of expansion, you still end up with multiple roots (I verified by calling Validate
on the expanded subgraphs, which would error with "multiple roots"). I'm not saying this is necessarily something which needs to be fixed here, but we definitely don't always get a single root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting... I wonder why that isn't reproducing on my end. It's working okay for me with the test case updated to generate multiple instances in each module. 🤔
I'll poke at it some more and see if I can get it to fail in the way you're describing or explain why it doesn't before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see what the problem is: it's actually related to the first observation I made above.
If there is only one resource instance in a particular module instance then RootTransformer
won't insert a new root node at all, because there's already only one node in the graph. But that node isn't a root node so when we subsume the graphs all together we end up with just a pile of resource instances and no root, which therefore fails validation.
I managed to defeat my own reproduction by already having changed the test to generate multiple resource instances per resource before I tried calling .Validate
on the result, and therefore masked the problem. 🤦
It does seem to work regardless and so I'm going to go ahead with this change as a minimal v1.3 backport but I'll open a follow-up PR (intended only for the main
branch, not backporting) to actually validate the DynamicExpand
results and then make sure it does indeed produce valid graphs in all cases.
We use a non-pointer value for this particular node, which means that there can never be two root nodes in the same graph: the graph implementation will just coalesce them together when a second one is added. Our resource expansion code is relying on that coalescing so that it can subsume together multiple graphs for different modules instances into a single mega-graph with all instances across all module instances, with any root nodes coalescing together to produce a single root. This also updates one of the context tests that exercises resource expansion so that it will generate multiple resource instance nodes per module and thus potentially have multiple roots to coalesce together. However, we aren't currently explicitly validating the return values from DynamicExpand and so this test doesn't actually fail if the coalescing doesn't happen. We may choose to validate the DynamicExpand result in a later commit in order to make it more obvious if future modifications fail to uphold this invariant.
This branch is confirmed working in several locations where we were previously seeing |
Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch. |
This change is required because Terraform v1.3.0 was affected by a bug that caused the module to crash Terraform (see hashicorp/terraform#31860 for more details). In addition, we now finally have a version constraint for the bucket module.
This change is required because Terraform v1.3.0 was affected by a bug that caused the module to crash Terraform (see hashicorp/terraform#31860 for more details). In addition, we now finally have a version constraint for the bucket module.
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
Over in #31846 we learned that the logic for registering the "checkable object instances" for resources with custom condition checks was running once per containing module instance, rather than once total to cover all instances of a resource across all module instances.
The checks infrastructure expects to be told the instances for each checkable object exactly once so that we can ensure it always either has complete information or no information. This is part of how we guarantee that we'll either always return the full set of instances for a checkable object or mark the check as "error" so that check-presenting UIs can produce a consistent set of instances on each run.
Unfortunately our previous approach to handling the two-level expansion of resources inside modules was to use two levels of
DynamicExpand
, which is a special inversion-of-control technique we use as part of the concurrent graph walk. That meant that there was no place in the code which had access to all of the instances of a particular resource at once.It turns out that the double-
DynamicExpand
isn't really necessary here, and was largely just a convenience when we were implementing multi-instance modules to avoid disrupting the existing resource expansion logic too much. The module expansion logic is all just local computation anyway, and it was holding a lock on the state for its duration, so it was getting no material benefit from being called concurrently by the graph walker.To correct the bug, this reworks the logic to just be one big DynamicExpand that uses normal straight-through code to gather all of the instances of a resource together into a dynamic subgraph, and register them with the checks system. This makes the control flow clearer while also ensuring that by the end of this function we know exactly which set of resource instances we're expecting.
I had originally intended to find a less invasive fix for this for backporting into the v1.3 branch to fix the bug, but the graph walker inversion of control meant that before there was no reasonable correct place to call
ReportCheckableObjects
objects from.To minimize the risk of doing such refactoring I made an effort to modify the actual logic as little as possible: what were previously the
Execute
andDynamicExpand
methods ofNodePlannableResource
are nownodeExpandPlannableResource.expandResourceInstances
andnodeExpandPlannableResource.resourceInstanceSubgraph
respectively, with them chained together as direct method calls rather than indirectly chained using an intermediate dynamic subgraph.Moving some logic that was previously in an
Execute
method to instead be in aDynamicExpand
method exposed a logic error in the graph walker's handling ofDynamicExpand
which I corrected here in the initial commit.This refactoring did not regress any existing tests except the ones that were directly testing
NodePlannableResource
, which are now removed altogether because that graph node type no longer exists. I added a new test to cover the resource-related portion of #31846.Issue #31846 seems to actually be capturing two separate but related bugs, where the other is about output values. This PR does not attempt to address the output value bug, since its root cause seems unrelated and so I intend to tackle it as a separate PR.