Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send the JSON state representation to Cloud backend (when available) #31698

Merged
merged 18 commits into from
Aug 30, 2022

Conversation

megan07
Copy link
Contributor

@megan07 megan07 commented Aug 26, 2022

No description provided.


schemas, diags := getSchemas(&c.Meta, stateTo, config)
if diags.HasErrors() {
c.Ui.Warn(fmt.Sprintf(failedToLoadSchemasMessage, err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to adjust the control flow here; loadConfig(path) shouldn't be called if os.Getwd errors, and we don't want to print the same warning three times. Same for other instances of this pattern.

internal/cloud/state.go Outdated Show resolved Hide resolved
…hemas along to PersistState in the remote state
@megan07 megan07 requested review from a team as code owners August 30, 2022 15:08
go.mod Outdated
@@ -40,7 +40,7 @@ require (
github.com/hashicorp/go-multierror v1.1.1
github.com/hashicorp/go-plugin v1.4.3
github.com/hashicorp/go-retryablehttp v0.7.1
github.com/hashicorp/go-tfe v1.7.0
github.com/hashicorp/go-tfe v1.8.1-0.20220826155809-b28335218b42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-merge TODO: release go-tfe and bump this ref to the released version


delegate remote.State
// Client Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️ This can be deleted

return nil
}

// statemgr.Persister impl.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should godoc each public method, making a comment like "PersistState uploads a snapshot the latest state as a StateVersion to Terraform Cloud"

"terraform_version": "1.3.0",
"serial": 1,
"lineage": "backend-change",
"outputs": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the payload should be only the outputs object.

@@ -443,3 +444,54 @@ func testVariables(s terraform.ValueSourceType, vs ...string) map[string]backend
}
return vars
}

func TestCloudLocks(t *testing.T, a, b statemgr.Full) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to state_test.go and probably not called as a helper but exist as a top level test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this test is called at all right now


const failedToLoadSchemasMessage = `
Terraform failed to load schemas, which will in turn affect its ability to generate the
external JSON state file. This will not have any adverse effects on Terraforms ability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
external JSON state file. This will not have any adverse effects on Terraforms ability
external JSON variant of the state file. This will not have any adverse effects on Terraform's ability

external JSON state file. This will not have any adverse effects on Terraforms ability
to maintain state information, but may have adverse effects on any external integrations
relying on this format. The file should be created on the next successful "terraform apply"
however, historic state information may be missing if the affected integration relies on that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should end this warning at on the next successful "terraform apply". because the rest sounds unnecessarily scary. I would appreciate a second opinion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm indeed this message is tricky because it's introducing what are probably a bunch of entirely new concepts from the user's perspective, and doing so at an "anxiety point" where we're telling the user about some potential negative consequences they need to make a decision about. 😖

I don't think there's a perfect way to do this because it's never ideal to introduce entirely new concepts in an error/warning message, but I think I'd suggest reordering the information here a bit and splitting it over multiple paragraphs so it's hopefully a little easier to skim:

Warning: Failed to update data for external integrations

Terraform was unable to generate a description of the updated
state for use with external integrations in Terraform Cloud.
Any integrations configured for this workspace which depend on
information from the state may not work correctly when using the
result of this action.

This problem occurs when Terraform cannot read the schema for
one or more of the providers used in the state. The next successful
apply will correct the problem by re-generating the JSON description
of the state:
    terraform apply

(It's hard to get word-wrapping write in the GitHub comment box; I probably didn't wrap the lines above correctly for presentation in a monospace font.)

My goal with this edit was to put the "what happend and why it matters" part of the message first, and then the "why it happened and what to do about it" part second.

As a secondary goal I tried to omit the implementation detail about there being two different state serializations and instead focus on the fact that this missing information is intended for external integrations. I don't expect the typical reader of this message to know that external integrations use our JSON state format or even that there is a JSON state format for integrations, and so mentioning it seems likely to leave the reader unsure of what the consequences of this situation are.

@brandonc brandonc changed the title TF-563 Send the JSON state representation to Cloud backend (when available) Aug 30, 2022
if isCloudMode(b) {
schemas, diags = c.GetSchemas(newState, nil)
if diags.HasErrors() {
c.Ui.Warn(fmt.Sprintf(failedToLoadSchemasMessage, err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are going to need to error out. The json marshaling code will panic if a nil schema is pass in.

Copy link
Member

@jbardin jbardin Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I take that back, digging deeper it looks like there were some nil checks added because that's happened before, and it should error out instead (though I don't think purposely sending nil is something that was well tested).

@kisunji kisunji removed the request for review from a team August 30, 2022 19:59
return schemas, diags

}
return nil, diags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it can return nil, nil, which is a bit unusual. Is that going to be a problem anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the various callsites of this function, I think I share the concern of a "nil with no errors" result being unexpected per typical Go idiom, which may cause confusion for future maintainers.

As a "quick fix" without doing any more significant immediate refactoring, I would suggest renaming this method to MaybeGetSchemas and mention in its documentation comment that it can potentially return nil without errors to indicate that there isn't enough information to return the schemas and that the caller must handle the lack of schema information itself accordingly. My hope is that an unusual method name with Maybe at the start of it will stand out enough to a future reader to cause them to investigate what the "maybe" implies, rather than just assuming this is a normal-ish function which will return errors if it isn't able to do its work.

if err := destinationState.PersistState(); err != nil {
// Fetching schemas during init might be more of a hassle than we want to attempt
// in the case that we're migrating to TFC backend, the initial JSON state won't
// be generated and stored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it matters for future ideas, the backend is currently handled before providers are installed during init, so requiring schemas here could lead to a catch-22 where it requires some manual intervention to proceed far enough for provider installation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the "hassle" wording you used could evolve into this observation. Useful insight

// EnableForcePush to allow the remote client to overwrite state
// by implementing remote.ClientForcePusher
func (s *State) EnableForcePush() {
s.forcePush = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some condition that disables this again later, or is forcing just on forever once you enable it?

Specifically, I'm worried about what happens if you use an instance of this type first to perform a "write for migration" with forcing enabled and then later use the same instance for normal writes. Is there a risk of skipping the safety check for the second write in that case?

One possible way to address this would be to make WriteState always set s.forcePush back to false, and make WriteStateForMigration just always set s.forcePush to whatever its force argument is, even if the caller set it to false. That would then allow the state manager to remember whether the most recent write was forced, rather than remembering whether any previous write was forced.

Does that seem reasonable? Am I missing this getting handled some other way? 🤔

Comment on lines 8 to 9
"github.com/hashicorp/terraform/internal/states/statefile"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure why our goimports PR check isn't catching this 🤷‍♂️ but the typical convention would be for this import to be grouped in with "github.com/hashicorp/terraform/internal/states/statemgr" below, so that all of the github.com/hashicorp/terraform imports are grouped together.

(Same seems to be true for some of the other files here; I won't comment on every one because that'll be too noisy. 😀 )

Comment on lines +141 to +142
addrs.NewLegacyProvider("null"): providers.FactoryFixed(p),
addrs.NewLegacyProvider("azurerm"): providers.FactoryFixed(p),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must confess I'm not really sure how this is fitting in with the other stuff here but I'm surprised to see use of addrs.NewLegacyProvider here because this idea of "legacy providers" was there to support migrating from Terraform v0.12 to v0.13 (when we first introduced the idea of provider namespaces) and so new code shouldn't ever encounter legacy providers.

Should these perhaps be NewDefaultProvider instead? That would make them match the addresses two real providers hashicorp/null and hashicorp/azurerm.

(We don't typically use real provider addresses in unit tests though, because it can make it seem like we're testing against a real provider rather than a fake one as seems to be the case here. If possible I would suggest either using one of the fake providers already present or using providers in an intentionally-fake namespace like the pre-existing one above, and the one added below.)

external JSON state file. This will not have any adverse effects on Terraforms ability
to maintain state information, but may have adverse effects on any external integrations
relying on this format. The file should be created on the next successful "terraform apply"
however, historic state information may be missing if the affected integration relies on that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm indeed this message is tricky because it's introducing what are probably a bunch of entirely new concepts from the user's perspective, and doing so at an "anxiety point" where we're telling the user about some potential negative consequences they need to make a decision about. 😖

I don't think there's a perfect way to do this because it's never ideal to introduce entirely new concepts in an error/warning message, but I think I'd suggest reordering the information here a bit and splitting it over multiple paragraphs so it's hopefully a little easier to skim:

Warning: Failed to update data for external integrations

Terraform was unable to generate a description of the updated
state for use with external integrations in Terraform Cloud.
Any integrations configured for this workspace which depend on
information from the state may not work correctly when using the
result of this action.

This problem occurs when Terraform cannot read the schema for
one or more of the providers used in the state. The next successful
apply will correct the problem by re-generating the JSON description
of the state:
    terraform apply

(It's hard to get word-wrapping write in the GitHub comment box; I probably didn't wrap the lines above correctly for presentation in a monospace font.)

My goal with this edit was to put the "what happend and why it matters" part of the message first, and then the "why it happened and what to do about it" part second.

As a secondary goal I tried to omit the implementation detail about there being two different state serializations and instead focus on the fact that this missing information is intended for external integrations. I don't expect the typical reader of this message to know that external integrations use our JSON state format or even that there is a JSON state format for integrations, and so mentioning it seems likely to leave the reader unsure of what the consequences of this situation are.

return schemas, diags

}
return nil, diags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the various callsites of this function, I think I share the concern of a "nil with no errors" result being unexpected per typical Go idiom, which may cause confusion for future maintainers.

As a "quick fix" without doing any more significant immediate refactoring, I would suggest renaming this method to MaybeGetSchemas and mention in its documentation comment that it can potentially return nil without errors to indicate that there isn't enough information to return the schemas and that the caller must handle the lack of schema information itself accordingly. My hope is that an unusual method name with Maybe at the start of it will stand out enough to a future reader to cause them to investigate what the "maybe" implies, rather than just assuming this is a normal-ish function which will return errors if it isn't able to do its work.


// Get schemas, if possible, before writing state
var schemas *terraform.Schemas
if isCloudMode(b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels kinda unfortunate that this idea of "cloud mode" is infecting so many codepaths now. I don't really have a better idea to suggest, though. 😖 Hopefully we can think about a more elaborate refactoring of this later, but this surgical approach is probably best for now.

if isCloudMode(b) {
schemas, diags = c.GetSchemas(state, nil)
if diags.HasErrors() {
c.Ui.Warn(fmt.Sprintf(failedToLoadSchemasMessage, err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really typically use c.Ui.Warn for new code because it generates yellow text that can be hard to read in some terminals.

In another comment elsewhere I suggested renaming c.GetSchemas to c.MaybeGetSchemas to try to be explicit that there's "something odd" about this function -- that it can potentially fail but still return successfully in some sense. In conjunction with that, I wonder about making that function itself return this warning in the tfdiags.Diagnostics it returns, so that we could then render it with our usual diagnostic formatting codepath. (c.showDiagnostics(diags), similar to some of the examples above)

Copy link
Contributor

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the timing and broad scope of this change, @megan07 and I have agreed in a backchannel to move forward with this change in its current state and then potentially address some of the feedback items in subsequent separate PRs that we expect to be more localized, once some background discussions have played out and it seems clearer which path to take.

@github-actions
Copy link
Contributor

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

@github-actions
Copy link
Contributor

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants