Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement graceful shutdown for http3 servers. #4407

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

WeidiDeng
Copy link
Contributor

Fix 6195 for caddy.

Copy link

@francislavoie francislavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo

http3/client.go Outdated Show resolved Hide resolved
@WeidiDeng
Copy link
Contributor Author

WeidiDeng commented Apr 5, 2024

@marten-seemann Do you have time to take a look at this draft?

@WeidiDeng WeidiDeng marked this pull request as ready for review April 7, 2024 02:30
Copy link

codecov bot commented Apr 7, 2024

Codecov Report

Attention: Patch coverage is 43.13725% with 116 lines in your changes are missing coverage. Please review.

Project coverage is 84.59%. Comparing base (e2fbf3c) to head (312a3d0).

Files Patch % Lines
http3/client.go 35.71% 42 Missing and 3 partials ⚠️
http3/server.go 61.17% 32 Missing and 8 partials ⚠️
http3/frames.go 0.00% 28 Missing ⚠️
http3/conn.go 0.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4407      +/-   ##
==========================================
- Coverage   85.15%   84.59%   -0.56%     
==========================================
  Files         154      154              
  Lines       14794    14990     +196     
==========================================
+ Hits        12597    12680      +83     
- Misses       1690     1791     +101     
- Partials      507      519      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this @WeidiDeng!

I’ve been working on a big refactor of the http3 package (#4115, #4116, more coming), which at the end will enable HTTP datagram support, which will all be included in the next release. For proper datagram support, we’ll introduce some kind of connection and stream tracking (no PR yet), which would be helpful for this PR. I suggest we defer graceful shutdown until these PRs have landed.

http3/client.go Outdated
config: conf,
opts: opts,
dialer: dialer,
receivedGoawayID: quic.StreamID(-4),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hacky. Introduce a protocol.InvalidStreamID instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the stream ID from rfc 9000. For different kinds of streams, the criteria for invalid id is different. I think a function that verifies ID would be better instead of a new type.

http3/frames.go Outdated Show resolved Hide resolved
http3/frames.go Show resolved Hide resolved
@@ -217,7 +218,8 @@ type Server struct {
mutex sync.RWMutex
listeners map[*QUICEarlyListener]listenerInfo

closed bool
closed bool
connections map[*quic.Connection]func()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better: Use the tracing ID as a map key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the methods on the quic.Connection needs to be called. To properly stop server from accepting new streams at the grace period's end. Malicious client may attempt to open new streams even if goaway is received. Goaway itself doesn't close a connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better: Use the tracing ID as a map key.

Update: the tracing ID has now been deprecated, so we shouldn't use that one.

# Conflicts:
#	http3/client.go
#	http3/client_test.go
#	http3/server.go
#	http3/server_test.go
# Conflicts:
#	http3/client.go
#	http3/client_test.go
#	http3/conn.go
#	http3/server.go
#	http3/server_test.go
# Conflicts:
#	http3/client_test.go
#	http3/server_test.go
@WeidiDeng
Copy link
Contributor Author

@marten-seemann When do you think this patch is ready to be reviewed? I haven't updated it in a while, and several of those enhancements already landed in the latest release. If you think it's ok to add graceful shutdown, I'll update this patch against the latest release.

@marten-seemann
Copy link
Member

Sorry, I dropped the ball on this PR. Let's get this into the next release! I'll review it later.

@marten-seemann marten-seemann added this to the v0.47 milestone Aug 7, 2024
@WeidiDeng WeidiDeng marked this pull request as draft August 8, 2024 00:12
Copy link
Member

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can solve this without having a map of connections in the Server. That feels kind of heavy. We already have a few Go routines blocking on AcceptStream, maybe we can use one of those to do the work of sending the GOAWAY?

ID: lastID + 4,
}).Append(b[:0])
str.Write(b)
mutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str.Write might block indefinitely. You shouldn't hold the mutex that long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map is there to make sure all connection is closed before CloseGracefully returns. It's kept in a map because this way server can begin to send goaway frame this way instead of relying on AcceptStream to accept a stream and then write a goaway.

The alternative is to create a goroutine that listens for when the server is closed. But that's even heavier weight than the current method of tracking with map. I guess that's why stdlib uses maps to do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marten-seemann What do you suggest we use to track the accepted quic connections? In stdlib, http/http2 server both track them using a map.

For http2, http2 connections have a single goroutine that handles all the write requests. so there is no lock when writing the goway.

@@ -217,7 +218,8 @@ type Server struct {
mutex sync.RWMutex
listeners map[*QUICEarlyListener]listenerInfo

closed bool
closed bool
connections map[*quic.Connection]func()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better: Use the tracing ID as a map key.

Update: the tracing ID has now been deprecated, so we shouldn't use that one.

mutex sync.Mutex
closed bool
// client opened bidirectional stream ID starts at 0 and increase by 4
lastID = quic.StreamID(-4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lastID = quic.StreamID(-4)
lastID = protocol.InvalidStreamID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to not close quic.Connection on the server side. Right now it's only closed on the server side when an error occurred during request handling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you want to close it? That's the point of graceful shutdown after all, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending goaway frame requires the server having some kinds of references to the quic connection. Closing it is not necessary as stdlib doesn't close either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marten-seemann I'm thinking using a counter to track the number of accepted quic connections, and a context that will be canceled when shutdown is called. context. AfterFunc can be used to send goaway in this case. The server only knows about the number of the quic connections. What do you think of this method?

Also, perhaps I should implement server side graceful shutdown first, client side later. To keep changes small.

@@ -618,11 +695,54 @@ func (s *Server) Close() error {
return err
}

func (s *Server) connectionsCount() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name suggests that this function doesn't have any side effects, but it clearly does.


var reject bool
mutex.Lock()
reject = closed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.closed?

@@ -83,6 +88,9 @@ func (c *SingleDestinationRoundTripper) init() {
c.decoder = qpack.NewDecoder(func(hf qpack.HeaderField) {})
c.requestWriter = newRequestWriter()
c.hconn = newConnection(c.Connection, c.EnableDatagrams, protocol.PerspectiveClient, c.Logger)
c.hconn.controlStrHandler = c.readControlStream
c.receivedGoawayID = quic.StreamID(-4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c.receivedGoawayID = quic.StreamID(-4)
c.receivedGoawayID = protocol.InvalidStreamID

}

// Separate goroutine to prevent interference with request cancellation
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but this looks to me like you're canceling requests that are in flight when receiving the GOAWAY. But that's exactly what we should not do: Requests still in flight should complete as if nothing happened.

http3/frames.go Outdated
const maxQuicVarintLen = 8

func parseGoawayFrame(r io.Reader, l uint64) (*goawayFrame, error) {
if l > maxQuicVarintLen {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier: just parse the varint, and check that the length matches the number of bytes parsed afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a check afterwards to see if data read matches the varint it encodes. This check is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not. This code is just unnecessarily complicated. Use quicvarint.Read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A malformed goaway frame can have an arbitrary large length defined. And quicvarint.Read is used later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just parse the two varints, then check that the length is consistent. Saves you 10 LOC and multiple allocs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to wrap the reader with a bufio.Reader to be able to use Read varint. And to limit the number of bytes read a limited reader is needed, or else the stream may be corrupt due to more data read than needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any feedbacks? @marten-seemann

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment still stands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show me some code snippets. I don't think I know how to think of this. Double read?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll do this once the rest of the PR is in a shape where we could merge it.

http3/frames.go Outdated Show resolved Hide resolved
http3/frames.go Outdated Show resolved Hide resolved
# Conflicts:
#	http3/client.go
#	http3/conn.go
#	http3/server.go
update goaway frame parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delay in Caddy using changes to reverse_proxy upstream
3 participants