-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backend state locking #25454
backend state locking #25454
Conversation
bb91ee1
to
4aa1cc6
Compare
Codecov Report
|
👋 I know it's weird (and extra work) to request a review on a draft PR, but I could really use the extra eyes to let me know if I'm on the right track. I've got plenty of tests to work on in the meantime. |
@@ -591,7 +591,7 @@ func (b *Remote) DeleteWorkspace(name string) error { | |||
} | |||
|
|||
// StateMgr implements backend.Enhanced. | |||
func (b *Remote) StateMgr(name string) (state.State, error) { | |||
func (b *Remote) StateMgr(name string) (statemgr.Full, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaking in an unrelated change to start removing references to the deprecated state
package:
Line 16 in 6824407
// State is a deprecated alias for statemgr.Full |
This adds tests to plan, apply and refresh which validate that the state is unlocked after all operations, regardless of exit status. I've also added specific tests that force Context() to fail during each operation to verify that locking behavior specifically.
I'm labeling this 0.13.1 with the caveat that we may decide to hold off till the 0.14 development cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To restate the goal to ensure I understand it: The backend context should return a locked state, unless there was a failure in creating that context. I think I can get behind this, although that "unless" raises some flags to me, but maybe it's okay.
As far as this approach, moving the code out of one place so it can be called in many is also suspicious ... I'm guessing that moving the order of the ctx diags calls alone didn't fix the issue you're describing? (https://github.com/hashicorp/terraform/pull/25454/files#diff-2cf6324dad1cc452ed13778ff1552b43R203-R207) I suppose what I'm expressing is I have some hope for being able to solve this without the need to add the same function calls in multiple places, but perhaps it's unavoidable.
@@ -0,0 +1,57 @@ | |||
package local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! New test file!
Yes, unfortunately that's required by this change. I could have avoided adding those into every |
Fixes #24246 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
There was a subtle (and confusing) difference between the local and remote backend state locking strategies:
This was only showing up as a problem in a few commands when one was using the remote backend but the command was (only) running locally, such as
terraform console
:terraform console
would always unlock state after getting the context, but if there was an error fromContext()
, the remote state was already unlocked and would result in a "workspace already unlocked" error.This PR aims for parity between remote and local by making the following changes:
backend/local
will unlock the state ifContext()
has an error, exactly asbackend/remote
does todayterraform console
andterraform import
will exit before unlocking state in case of error inContext()
backend.go
and into each individual state operationMy first attempt at this PR broke basically everything, and yet it wasn't caught (reasonably: there were probably plenty of tests that expected an error, and continued to find the error. But possibly the wrong error, or at least an extra error). So there's more testing to do, and I have yet to dig into other commands that might need some work (my main concerns are the state commands, which are another set of kind-of-local-but-remote commands).