-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement graceful shutdown for http3 servers. #4407
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo
@marten-seemann Do you have time to take a look at this draft? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4407 +/- ##
==========================================
- Coverage 85.15% 84.59% -0.56%
==========================================
Files 154 154
Lines 14794 14990 +196
==========================================
+ Hits 12597 12680 +83
- Misses 1690 1791 +101
- Partials 507 519 +12 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @WeidiDeng!
I’ve been working on a big refactor of the http3 package (#4115, #4116, more coming), which at the end will enable HTTP datagram support, which will all be included in the next release. For proper datagram support, we’ll introduce some kind of connection and stream tracking (no PR yet), which would be helpful for this PR. I suggest we defer graceful shutdown until these PRs have landed.
http3/client.go
Outdated
config: conf, | ||
opts: opts, | ||
dialer: dialer, | ||
receivedGoawayID: quic.StreamID(-4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems hacky. Introduce a protocol.InvalidStreamID instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the stream ID from rfc 9000. For different kinds of streams, the criteria for invalid id is different. I think a function that verifies ID would be better instead of a new type.
@@ -217,7 +218,8 @@ type Server struct { | |||
mutex sync.RWMutex | |||
listeners map[*QUICEarlyListener]listenerInfo | |||
|
|||
closed bool | |||
closed bool | |||
connections map[*quic.Connection]func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better: Use the tracing ID as a map key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the methods on the quic.Connection
needs to be called. To properly stop server from accepting new streams at the grace period's end. Malicious client may attempt to open new streams even if goaway is received. Goaway itself doesn't close a connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better: Use the tracing ID as a map key.
Update: the tracing ID has now been deprecated, so we shouldn't use that one.
# Conflicts: # http3/client.go # http3/client_test.go # http3/server.go # http3/server_test.go
# Conflicts: # http3/client.go # http3/client_test.go # http3/conn.go # http3/server.go # http3/server_test.go
# Conflicts: # http3/client_test.go # http3/server_test.go
@marten-seemann When do you think this patch is ready to be reviewed? I haven't updated it in a while, and several of those enhancements already landed in the latest release. If you think it's ok to add graceful shutdown, I'll update this patch against the latest release. |
Sorry, I dropped the ball on this PR. Let's get this into the next release! I'll review it later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we can solve this without having a map of connections in the Server
. That feels kind of heavy. We already have a few Go routines blocking on AcceptStream
, maybe we can use one of those to do the work of sending the GOAWAY?
ID: lastID + 4, | ||
}).Append(b[:0]) | ||
str.Write(b) | ||
mutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str.Write
might block indefinitely. You shouldn't hold the mutex that long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map is there to make sure all connection is closed before CloseGracefully
returns. It's kept in a map because this way server can begin to send goaway frame this way instead of relying on AcceptStream
to accept a stream and then write a goaway.
The alternative is to create a goroutine that listens for when the server is closed. But that's even heavier weight than the current method of tracking with map. I guess that's why stdlib uses maps to do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marten-seemann What do you suggest we use to track the accepted quic connections? In stdlib, http/http2 server both track them using a map.
For http2, http2 connections have a single goroutine that handles all the write requests. so there is no lock when writing the goway.
@@ -217,7 +218,8 @@ type Server struct { | |||
mutex sync.RWMutex | |||
listeners map[*QUICEarlyListener]listenerInfo | |||
|
|||
closed bool | |||
closed bool | |||
connections map[*quic.Connection]func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better: Use the tracing ID as a map key.
Update: the tracing ID has now been deprecated, so we shouldn't use that one.
mutex sync.Mutex | ||
closed bool | ||
// client opened bidirectional stream ID starts at 0 and increase by 4 | ||
lastID = quic.StreamID(-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastID = quic.StreamID(-4) | |
lastID = protocol.InvalidStreamID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to not close quic.Connection on the server side. Right now it's only closed on the server side when an error occurred during request handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you want to close it? That's the point of graceful shutdown after all, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending goaway frame requires the server having some kinds of references to the quic connection. Closing it is not necessary as stdlib doesn't close either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marten-seemann I'm thinking using a counter to track the number of accepted quic connections, and a context that will be canceled when shutdown is called. context. AfterFunc
can be used to send goaway in this case. The server only knows about the number of the quic connections. What do you think of this method?
Also, perhaps I should implement server side graceful shutdown first, client side later. To keep changes small.
@@ -618,11 +695,54 @@ func (s *Server) Close() error { | |||
return err | |||
} | |||
|
|||
func (s *Server) connectionsCount() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name suggests that this function doesn't have any side effects, but it clearly does.
|
||
var reject bool | ||
mutex.Lock() | ||
reject = closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.closed
?
@@ -83,6 +88,9 @@ func (c *SingleDestinationRoundTripper) init() { | |||
c.decoder = qpack.NewDecoder(func(hf qpack.HeaderField) {}) | |||
c.requestWriter = newRequestWriter() | |||
c.hconn = newConnection(c.Connection, c.EnableDatagrams, protocol.PerspectiveClient, c.Logger) | |||
c.hconn.controlStrHandler = c.readControlStream | |||
c.receivedGoawayID = quic.StreamID(-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.receivedGoawayID = quic.StreamID(-4) | |
c.receivedGoawayID = protocol.InvalidStreamID |
} | ||
|
||
// Separate goroutine to prevent interference with request cancellation | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but this looks to me like you're canceling requests that are in flight when receiving the GOAWAY. But that's exactly what we should not do: Requests still in flight should complete as if nothing happened.
http3/frames.go
Outdated
const maxQuicVarintLen = 8 | ||
|
||
func parseGoawayFrame(r io.Reader, l uint64) (*goawayFrame, error) { | ||
if l > maxQuicVarintLen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier: just parse the varint, and check that the length matches the number of bytes parsed afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a check afterwards to see if data read matches the varint it encodes. This check is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not. This code is just unnecessarily complicated. Use quicvarint.Read
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A malformed goaway frame can have an arbitrary large length defined. And quicvarint.Read
is used later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just parse the two varints, then check that the length is consistent. Saves you 10 LOC and multiple allocs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to wrap the reader with a bufio.Reader
to be able to use Read varint. And to limit the number of bytes read a limited reader is needed, or else the stream may be corrupt due to more data read than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any feedbacks? @marten-seemann
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment still stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show me some code snippets. I don't think I know how to think of this. Double read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll do this once the rest of the PR is in a shape where we could merge it.
# Conflicts: # http3/client.go # http3/conn.go # http3/server.go
update goaway frame parsing
Fix 6195 for caddy.