Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport/server: fix race that could cause a stray header to be sent #5513

Merged
merged 2 commits into from
Jul 28, 2022

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Jul 19, 2022

Fixes #5511

RELEASE NOTES:

  • transport/server: fix race that could cause a stray header to be sent

@@ -1013,17 +1012,19 @@ func (t *http2Server) writeHeaderLocked(s *Stream) error {
// TODO(zhaoq): Now it indicates the end of entire stream. Revisit if early
// OK is adopted.
func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error {
s.hdrMu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what you wrote in issue -" It should be fine to hold the stream's header mutex during these entire functions; they are very infrequently called" is why it is ok for this unlock to be past stuff like stats handlers which would block any operations waiting on the mutex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could unlock before calling the stats handler (we technically just need to call updateHeaderSent and finishStream to prevent any races), but this should be okay considering a race between multiple calls to WriteStatus and/or WriteHeader should be extremely rare.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg. Seems very minimal, but was just wondering your thought process.

}

s.hdrMu.Lock()
defer s.hdrMu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is orthogonal to the bug fix right? I get that this fixes a correctness issue with s.updateHeaderSent, but you could provide me a concrete example of what this fixes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to hold the lock while calling s.updateHeaderSent and checking/setting s.getState at least. Otherwise it races with WriteStatus doing similar operations. So while this is not exactly the original bug, it's another related race that is possible if trying to send headers while returning from the method handler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense to me. Although you need to hold mutex for writeHeaderLocked() too right? (What you have currently...but your comment just mentions updateHeaderSent and getState).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that too.

t.Fatalf("Error starting endpoint server: %v", err)
}
defer ss.Stop()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 10 * time.Second

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces?

We have a pretty healthy existing mix of spaces/no spaces, and gofmt is OK with either.

$ grep -r '\* time.Second' . | wc -l
     235
$ grep -r '\*time.Second' . | wc -l
     134

I'm inclined to just leave this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, talked about offline, this was thought carried over from two of my internship's linters. Didn't realize we have both for some reason. This is fine.

@@ -8077,3 +8077,33 @@ func (s) TestUnexpectedEOF(t *testing.T) {
}
}
}

// TestRecvWhileReturningStatus performs a Recv in a service handler while the
// handler returns its status. A race condition could result in the server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A race condition could result in the server sending headers twice (once as a result of the status and once as a result of the failed Recv call)." This doesn't seem like the race to me? Isn't the race the bit where instead of writing the header with result from status first it writes the header from failed Recv call without status first due to s.updateHeaderSent is true). Regardless of ordering, both headers get sent out anyway? Hence your comment in issue "E.g. to ignore the extra headers on a stream after END_STREAM was already sent".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the race the bit where instead of writing the header with result from status first it writes the header from failed Recv call without status first due to s.updateHeaderSent is true

I'm not sure of the order -- it probably doesn't matter. The failed Recv call will also write a status, though, here:

grpc-go/stream.go

Lines 1586 to 1588 in b695a7f

if err != nil && err != io.EOF {
st, _ := status.FromError(toRPCErr(err))
ss.t.WriteStatus(ss.s, st)

Is there something you think needs to change here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that line when triaging. All I was getting at with this comment was that I think the explanation of the race in the comment is wrong. The race is the fact that the second WriteStatus call reads updateHeaderSent as true, doesn't attach :status, and then sends that BEFORE the first WriteStatus call on the wire without the Status, which is what broke howardjohn. Saying the race is sending headers twice is accurate but not the explanation as to why it broke client in this scenario. (And yes it also breaks HTTP_2 because sends another frame after END_STREAM)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I hadn't completely linked the reason for the failure on the client side, actually, but that makes more sense than my original thought that two :status headers were written and the client saw the second one and errored. So my comment about "ignore the extra headers on a stream after END_STREAM was already sent" is probably bogus. Updated this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks comment LGTM.

@zasweq zasweq assigned dfawley and unassigned zasweq Jul 22, 2022
@dfawley dfawley assigned zasweq and unassigned dfawley Jul 27, 2022
@@ -1013,17 +1012,19 @@ func (t *http2Server) writeHeaderLocked(s *Stream) error {
// TODO(zhaoq): Now it indicates the end of entire stream. Revisit if early
// OK is adopted.
func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error {
s.hdrMu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg. Seems very minimal, but was just wondering your thought process.

@@ -1052,7 +1053,7 @@ func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error {
endStream: true,
onWrite: t.setResetPingStrikes,
}
s.hdrMu.Unlock()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we want this blank line (used to have no blanks)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Helps to visually separate building the frame from writing it.

@@ -8077,3 +8077,33 @@ func (s) TestUnexpectedEOF(t *testing.T) {
}
}
}

// TestRecvWhileReturningStatus performs a Recv in a service handler while the
// handler returns its status. A race condition could result in the server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that line when triaging. All I was getting at with this comment was that I think the explanation of the race in the comment is wrong. The race is the fact that the second WriteStatus call reads updateHeaderSent as true, doesn't attach :status, and then sends that BEFORE the first WriteStatus call on the wire without the Status, which is what broke howardjohn. Saying the race is sending headers twice is accurate but not the explanation as to why it broke client in this scenario. (And yes it also breaks HTTP_2 because sends another frame after END_STREAM)

}

s.hdrMu.Lock()
defer s.hdrMu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense to me. Although you need to hold mutex for writeHeaderLocked() too right? (What you have currently...but your comment just mentions updateHeaderSent and getState).

t.Fatalf("Error starting endpoint server: %v", err)
}
defer ss.Stop()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, talked about offline, this was thought carried over from two of my internship's linters. Didn't realize we have both for some reason. This is fine.

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in WriteStatus seems to lead to invalid responses from server
2 participants