-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transport/server: fix race that could cause a stray header to be sent #5513
Conversation
@@ -1013,17 +1012,19 @@ func (t *http2Server) writeHeaderLocked(s *Stream) error { | |||
// TODO(zhaoq): Now it indicates the end of entire stream. Revisit if early | |||
// OK is adopted. | |||
func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error { | |||
s.hdrMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what you wrote in issue -" It should be fine to hold the stream's header mutex during these entire functions; they are very infrequently called" is why it is ok for this unlock to be past stuff like stats handlers which would block any operations waiting on the mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could unlock before calling the stats handler (we technically just need to call updateHeaderSent
and finishStream
to prevent any races), but this should be okay considering a race between multiple calls to WriteStatus
and/or WriteHeader
should be extremely rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg. Seems very minimal, but was just wondering your thought process.
} | ||
|
||
s.hdrMu.Lock() | ||
defer s.hdrMu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is orthogonal to the bug fix right? I get that this fixes a correctness issue with s.updateHeaderSent, but you could provide me a concrete example of what this fixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to hold the lock while calling s.updateHeaderSent
and checking/setting s.getState
at least. Otherwise it races with WriteStatus
doing similar operations. So while this is not exactly the original bug, it's another related race that is possible if trying to send headers while returning from the method handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense to me. Although you need to hold mutex for writeHeaderLocked() too right? (What you have currently...but your comment just mentions updateHeaderSent and getState).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that too.
t.Fatalf("Error starting endpoint server: %v", err) | ||
} | ||
defer ss.Stop() | ||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 10 * time.Second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces?
We have a pretty healthy existing mix of spaces/no spaces, and gofmt
is OK with either.
$ grep -r '\* time.Second' . | wc -l
235
$ grep -r '\*time.Second' . | wc -l
134
I'm inclined to just leave this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, talked about offline, this was thought carried over from two of my internship's linters. Didn't realize we have both for some reason. This is fine.
@@ -8077,3 +8077,33 @@ func (s) TestUnexpectedEOF(t *testing.T) { | |||
} | |||
} | |||
} | |||
|
|||
// TestRecvWhileReturningStatus performs a Recv in a service handler while the | |||
// handler returns its status. A race condition could result in the server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"A race condition could result in the server sending headers twice (once as a result of the status and once as a result of the failed Recv call)." This doesn't seem like the race to me? Isn't the race the bit where instead of writing the header with result from status first it writes the header from failed Recv call without status first due to s.updateHeaderSent is true). Regardless of ordering, both headers get sent out anyway? Hence your comment in issue "E.g. to ignore the extra headers on a stream after END_STREAM was already sent".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the race the bit where instead of writing the header with result from status first it writes the header from failed Recv call without status first due to s.updateHeaderSent is true
I'm not sure of the order -- it probably doesn't matter. The failed Recv
call will also write a status, though, here:
Lines 1586 to 1588 in b695a7f
if err != nil && err != io.EOF { | |
st, _ := status.FromError(toRPCErr(err)) | |
ss.t.WriteStatus(ss.s, st) |
Is there something you think needs to change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw that line when triaging. All I was getting at with this comment was that I think the explanation of the race in the comment is wrong. The race is the fact that the second WriteStatus call reads updateHeaderSent as true, doesn't attach :status, and then sends that BEFORE the first WriteStatus call on the wire without the Status, which is what broke howardjohn. Saying the race is sending headers twice is accurate but not the explanation as to why it broke client in this scenario. (And yes it also breaks HTTP_2 because sends another frame after END_STREAM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I hadn't completely linked the reason for the failure on the client side, actually, but that makes more sense than my original thought that two :status
headers were written and the client saw the second one and errored. So my comment about "ignore the extra headers on a stream after END_STREAM was already sent" is probably bogus. Updated this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks comment LGTM.
@@ -1013,17 +1012,19 @@ func (t *http2Server) writeHeaderLocked(s *Stream) error { | |||
// TODO(zhaoq): Now it indicates the end of entire stream. Revisit if early | |||
// OK is adopted. | |||
func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error { | |||
s.hdrMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg. Seems very minimal, but was just wondering your thought process.
@@ -1052,7 +1053,7 @@ func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error { | |||
endStream: true, | |||
onWrite: t.setResetPingStrikes, | |||
} | |||
s.hdrMu.Unlock() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do we want this blank line (used to have no blanks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Helps to visually separate building the frame from writing it.
@@ -8077,3 +8077,33 @@ func (s) TestUnexpectedEOF(t *testing.T) { | |||
} | |||
} | |||
} | |||
|
|||
// TestRecvWhileReturningStatus performs a Recv in a service handler while the | |||
// handler returns its status. A race condition could result in the server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw that line when triaging. All I was getting at with this comment was that I think the explanation of the race in the comment is wrong. The race is the fact that the second WriteStatus call reads updateHeaderSent as true, doesn't attach :status, and then sends that BEFORE the first WriteStatus call on the wire without the Status, which is what broke howardjohn. Saying the race is sending headers twice is accurate but not the explanation as to why it broke client in this scenario. (And yes it also breaks HTTP_2 because sends another frame after END_STREAM)
} | ||
|
||
s.hdrMu.Lock() | ||
defer s.hdrMu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense to me. Although you need to hold mutex for writeHeaderLocked() too right? (What you have currently...but your comment just mentions updateHeaderSent and getState).
t.Fatalf("Error starting endpoint server: %v", err) | ||
} | ||
defer ss.Stop() | ||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, talked about offline, this was thought carried over from two of my internship's linters. Didn't realize we have both for some reason. This is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fixes #5511
RELEASE NOTES: