Apply optimizations for handling of condition checks #32123

jbardin · 2022-10-31T15:05:25Z

The handling of conditions during apply can have adverse affects on very large configs, due to the handling of all existing instances even when there are no changes. With extremely large configurations, the execution time can go from a few minutes to many hours only for minor changes.

The first step is to ensure empty CheckResults in the state are normalized to prevent the state from being re-written when there are no semantic differences. Empty and nil slices are not handled consistently overall, but we will use an empty slice in the state zero value to match the other top-level structures.

Next we can skip adding NoOp changes to the apply graph when there are no conditions at all to evaluate from the configuration. The nodes themselves can have significant processing overhead when assembling the graph, especially for steps like recalculating individual instance dependencies.

Finally we need to avoid re-planning each instance and re-writing the state for every NoOp change during apply. The whole process has gotten more convoluted with the addition of condition checks, as preconditions are done during the NodeAbstractResourceInstance.plan method, but postconditions are handled outside of the abstract instance in the NodeApplyableResourceInstance.managedResourceExecute method, and repetition data needs to be shared between all these. I pulled the repetition data out of apply to simplify the early return in the short-term, but there is definitely some more refactoring to be done in this area.

Fixes #32060
Fixes #32071

apparentlymart

This makes sense to me as a tactical fix for the v1.3 series. Thanks for digging into this!

I left some notes inline about things that are more architectural, so I expect we'll ignore them for the purposes of merging this PR but I'd like to consider them for follow-up improvements in the v1.4 branch to hopefully have a couple less potential hazards for future maintenance.

apparentlymart · 2022-10-31T16:39:14Z

internal/terraform/node_resource_abstract_instance.go

 func (n *NodeAbstractResourceInstance) apply(
 	ctx EvalContext,
 	state *states.ResourceInstanceObject,
 	change *plans.ResourceInstanceChange,
 	applyConfig *configs.Resource,
-	createBeforeDestroy bool) (*states.ResourceInstanceObject, instances.RepetitionData, tfdiags.Diagnostics) {
+	keyData instances.RepetitionData,


Technically anything which has access to the EvalContext can get the "repetition data" like this:

exp := ctx.InstanceExpander() keyData := exp.GetResourceInstanceRepetitionData(n.Addr())

In the original design for this "instance expander" thing it was intended to act as the source of record for all of this expansion-related information specifically so that we wouldn't need to explicitly pass all of this contextual data around.

I do feel in two minds about it because it feels like a violation of Law of Demeter, but we already widely use ctx as a big bag of shared "global" context, including in this very function where it uses ctx to do various other things that aren't strictly this function's business to do.

Yes, I left this as close to the current code as possible. I did forget that the rep data was directly available though, because the existing apply code already re-evaluated it anyway. Switching out that code however runs into a no expansion has been registered panic, so I think there's something else going on here too.

Hmm yes I suppose the fact that we've been passing it around through arguments meant that there was not previously any need to actually register it at the source of record. Unfortunate but not unexpected.

Hopefully in a future change we can find a suitable central point to evaluate this just once and register it in the expander, so it's clearer exactly whose responsibility it is to evaluate the for_each or count expressions.

apparentlymart · 2022-10-31T16:50:08Z

internal/terraform/transform_diff.go

@@ -69,7 +93,7 @@ func (t *DiffTransformer) Transform(g *Graph) error {
 			// run any condition checks associated with the object, to
 			// make sure that they still hold when considering the
 			// results of other changes.
-			update = true
+			update = t.hasConfigConditions(addr)


I feel weird about making the DiffTransformer specifically aware of the preconditions in the config; this feels like some action at a distance that's likely to trip us up under future maintenance just has how not having no-op actions in the graph caused us to ship a subtly broken postconditions feature in the first place. 😖

This very direct approach does seem reasonable as a small, surgical fix for backporting into the v1.3 branch, but I wonder if there's a different way we could arrange this in the v1.4 branch (soon after this is merged) so that this will hopefully be less of a maintenance hazard.

One starting idea, though I've not thought deeply about it yet:

Add yet another optional interface that graph nodes can implement:

type GraphNodeDuringApply interface { HasApplyTimeActions() bool }

In a new transformer after we've done all the ones that insert new nodes, prune from the graph any node which implements this interface and HasApplyTimeActions() returns false.

If even inserting them in the first place is problematic then we could optionally optimize this by having DiffTransformer pre-check for the interface itself and not even insert the node in the first place. But my hope is that this wouldn't be necessary because we could do the pruning early enough that it's done by the time we get into all of the expensive transformers that do edge analysis.

Then this "are there any conditions?" logic could live in GraphNodeApplyableResourceInstance (and similar), closer to the Execute function that it's acting as a gate for. It would presumably return false unless the action is something other than NoOp or there's at least one condition to check.

Since I'm explicitly not suggesting we do this refactoring as a blocker for this PR, happy to discuss this more in a more appropriate place after this is merged. 😀

I tried a few different approaches here, and settled on this config reference only meant as a stop-gap. I think since we do store the plan check results, the results of those could be included in the "diff" that we're handling here. I was running into problems with inconsistent handling of the results however, because those have separate codepaths from normal changes, and this turned out to be a safer and more reliable failsafe for now. Since unknown checks from planning also would indicate "HasApplyTimeActions", it may be worth seeing if we can incorporate the checks more closely with the plan object so they are always in sync.

If there are no changes, then there is no reason to create an apply graph since all objects are known. We however do need the walk to match the expected state structure. This is probably only cleanup of empty nested modules and outputs, but some investigation is needed before making the full change. For now we can store the checks from the plan directly into the new state, since the apply walk overwrote the results we had already.

Ensure that empty check results are normalized in state serialization to prevent unexpected state changes from being written. Because there is no consistent empty, null and omit_empty usage for state structs, there's no good way to create a test which will fail for future additions.

ONly add NoOp changes to the apply graph if they have conditions which need to be evaluated.

We need to avoid re-writing the state for every NoOp apply. We may still be evaluating the instance to account for any side-effects in the condition checks, however the state of the instance has not changes. Re-writing the state is a non-current operation, which may require encoding a fairly large instance state and re-serializing the entire state blob, so it is best avoided if possible.

jbardin · 2022-11-01T20:32:40Z

The earlier branches were no longer needed for this patch, so I rebased this on main in order to merge it for the release.

github-actions · 2022-11-01T20:32:58Z

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

github-actions · 2022-12-02T02:15:01Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

jbardin added the 1.3-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged label Oct 31, 2022

jbardin requested a review from a team October 31, 2022 15:05

vercel bot deployed to Preview October 31, 2022 15:06 View deployment

apparentlymart approved these changes Oct 31, 2022

View reviewed changes

jbardin added 7 commits November 1, 2022 16:18

changes are mutated during apply

fa4c652

fix log mesage

19152e7

only add NoOp nodes with conditions

eb88ccb

ONly add NoOp changes to the apply graph if they have conditions which need to be evaluated.

use key data from plan method for apply

efd7715

jbardin force-pushed the jbardin/noop-apply-optimizations branch from 3bbf3d7 to efd7715 Compare November 1, 2022 20:18

jbardin requested a review from a team as a code owner November 1, 2022 20:18

jbardin changed the base branch from jbardin/apply-refresh-plan to main November 1, 2022 20:19

jbardin removed the request for review from a team November 1, 2022 20:19

vercel bot deployed to Preview November 1, 2022 20:25 View deployment

jbardin merged commit bb68075 into main Nov 1, 2022

jbardin deleted the jbardin/noop-apply-optimizations branch November 1, 2022 20:32

teamterraform mentioned this pull request Nov 1, 2022

Backport of Apply optimizations for handling of condition checks into v1.3 #32134

Merged

jbardin mentioned this pull request Nov 2, 2022

don't lose checks from refresh-only plan #32115

Merged

github-actions bot locked as resolved and limited conversation to collaborators Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply optimizations for handling of condition checks #32123

Apply optimizations for handling of condition checks #32123

jbardin commented Oct 31, 2022 •

edited

Loading

apparentlymart left a comment

apparentlymart Oct 31, 2022

jbardin Oct 31, 2022

apparentlymart Oct 31, 2022

apparentlymart Oct 31, 2022

jbardin Oct 31, 2022

jbardin commented Nov 1, 2022

github-actions bot commented Nov 1, 2022

github-actions bot commented Dec 2, 2022

Apply optimizations for handling of condition checks #32123

Apply optimizations for handling of condition checks #32123

Conversation

jbardin commented Oct 31, 2022 • edited Loading

apparentlymart left a comment

Choose a reason for hiding this comment

apparentlymart Oct 31, 2022

Choose a reason for hiding this comment

jbardin Oct 31, 2022

Choose a reason for hiding this comment

apparentlymart Oct 31, 2022

Choose a reason for hiding this comment

apparentlymart Oct 31, 2022

Choose a reason for hiding this comment

jbardin Oct 31, 2022

Choose a reason for hiding this comment

jbardin commented Nov 1, 2022

github-actions bot commented Nov 1, 2022

github-actions bot commented Dec 2, 2022

jbardin commented Oct 31, 2022 •

edited

Loading