Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do client socket resets cause pending Deferred to be cancelled? #546

Open
KentShikama opened this issue Feb 24, 2022 · 4 comments
Open

Comments

@KentShikama
Copy link

Imagine you have a route like the following.

    @app.route("/foo", methods=["POST"])
    @inlineCallbacks
    def doStuff(self, request):
        yield foo()
        yield bar()
        return {"message": "success"}

And then a client calls POST /foo and disconnects right after Klein finishes processing foo(). Currently, bar() would get cancelled. I understand that the socket has closed so that the {"message": "success"} cannot be written back but it would be nice if the Deferred bar() was executed. Is there some use case that I'm failing to understand here?

Or to put it differently, I believe we should stop executing the .cancel() on the Deferred if the request finishes at

lambda _: d.cancel(),
.

@radifalco
Copy link

Agree that it seems odd for a socket reset by the client to cause all the deferreds associated with the call to be canceled. I'd expect the only impact to be that I cannot write a response on the closed socket.

KentShikama added a commit to KentShikama/klein that referenced this issue Aug 16, 2022
See twisted#546.

We have been running a similar patch on our enterprise production systems for about 6 months with no issues, and critically we have stopped getting issues on client resets.
@glyph
Copy link
Member

glyph commented Oct 3, 2022

If your route is doing a bunch of expensive computation for a client that has gone away, it's wasting resources. The cancellation is how you find out that the place you're writing a response to is gone, so you can stop doing the work to produce the response. The default, I feel, makes sense here.

If, at the application level, you have some logic that you want to run to completion regardless of the client being available, you can shield your Deferreds from cancellation. This is tricky enough that we should probably include something in Twisted proper that functions like asyncio.shield, but for reference, here's how such a thing is implemented:

from functools import wraps
from typing import Callable, ParamSpec, TypeVar

from twisted.internet.defer import Deferred


T = TypeVar("T")
NoCancel = object()


def shield(d: Deferred[T]) -> Deferred[T]:
    currentWrapped: Deferred[T]
    def rewrap() -> Deferred[T]:
        nonlocal currentWrapped
        currentWrapped = Deferred(lambda it: it.callback(NoCancel)).addCallback(cancelBlock)
        return currentWrapped

    def cancelBlock(result: object) -> Deferred[T]:
        if result is not NoCancel:
            return result
        return rewrap()

    currentWrapped = rewrap()

    d.addBoth(lambda anything: currentWrapped.callback(anything))
    return currentWrapped

If you're using @inlineCallbacks, you can then annotate your routes with something like this:

P = ParamSpec("P")

def shielded(decoratee: Callable[P, Deferred[T]]) -> Callable[P, Deferred[T]]:
    @wraps(decoratee)
    def decorated(*args: P.args, **kwargs: P.kwargs) -> Deferred[T]:
        return shield(decoratee(*args, **kwargs))
    return decorated

i.e.

    @app.route("/foo", methods=["POST"])
    @shielded
    @inlineCallbacks
    def doStuff(self, request):
        yield foo()
        yield bar()
        return {"message": "success"}

If your entire application is routes like this (everything wants to make a best-effort attempt to run to completion, nothing cares if clients are still listening) then I would be open to having this be Klein-instance-level configurable, perhaps with a .route-level customization keyword parameter. i.e. you could say app = Klein(canceling=False) and then @app.route(..., canceling=True) if only one or two routes care about this.

If we remove the .cancel() entirely, then applications which do want to know about clients which have idled out cannot find out of them and may do lots of unnecessary work. This is particularly relevant in the case of something like a file-transfer application which may need to prepare large blobs to transfer.

@KentShikama
Copy link
Author

KentShikama commented Dec 6, 2022

@glyph Thanks for your detailed response! We've currently found a workaround by overriding KleinResource with our one line change, so that this non-cancelling behavior applies to all our routes. Of course, it would be nice if this override could be replaced by a canceling=False flag as you suggested, as it would be much cleaner.

I disagree that cancelling should be the default behavior. My understanding is that Klein is a library that competes with the likes of Flask or FastAPI, the former of which is not async and the latter of which is non-cancelling by default. I understand that for reads (aka GETs) that cancelling is an optimization. But for writes like POST and PUT, the main goal is often the side-effect of the deferreds and not the final response. It becomes particularly tricky to reason about a client that resets and retries a POST route. Without cancellation, either everything needs to be retried or nothing needs to be retried. With cancellation, if 2 out of 3 deferreds finish before the client resets, then if the client retries, you can't just retry all 3 deferreds as that would duplicate the first 2 (unless they are idempotent). In short, it becomes much harder to build a robust route. Depending on the route, it may be unnecessary to complete all deferreds, but it is almost never "wrong". But a naive consumer of this library that assumed non-cancelling might be surprised only after millions of requests - after all client resets are relatively unusual and client resets during "meaningful" periods can be even rarer. We've officially tallied over 40 human hours of debugging pain across our organization for related issues. Of course it depends on what kind of application you're building, but for the ones I've been involved in 99% of routes don't need to be optimized at all and even less need to be optimized for client resets. I hope you reconsider, but at the end of the day this isn't our library, and I totally understand if the applications you're dealing with work better with the cancelling default.

@glyph
Copy link
Member

glyph commented Dec 6, 2022

@KentShikama You make an interesting case here. I am not totally sure I agree with all of your reasoning here (cancellation remains important to provide a mechanism for defense against e.g. tarpit attacks), but there's clearly a use-case to be supported here, and I think my initial response was indeed too simplistic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants