Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect (too long) header from server causes Twisted Agent to fail #8570

Open
twisted-trac opened this issue Jul 7, 2016 · 12 comments · May be fixed by #12094
Open

Incorrect (too long) header from server causes Twisted Agent to fail #8570

twisted-trac opened this issue Jul 7, 2016 · 12 comments · May be fixed by #12094

Comments

@twisted-trac
Copy link

pawelmhm's avatar @pawelmhm reported
Trac ID trac#8570
Type enhancement
Created 2016-07-07 12:58:22Z

If Twisted Agent receives HTTP header with value that exceeds size limits defined in protocol it will fail and crash.

Expected behavior: if server sends incorrect header client should not fail but simply ignore this header.

To reproduce, create following Twisted server sending header with 100 000 characters in value:

#
import sys
from twisted.python import log
from twisted.web import server, resource
from twisted.internet import reactor

class Simple(resource.Resource):
    isLeaf = True

    def render_GET(self, request):
        request.setHeader("Set-Cookie", "a" * 100000)
        return "<html>Hello, world!</html>"

site = server.Site(Simple())
reactor.listenTCP(8080, site)
log.startLogging(sys.stdout)
reactor.run()

Twisted client trying to reach above server:

from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers

agent = Agent(reactor)

d = agent.request(
    'GET',
    'http://localhost:8080',
    Headers({'User-Agent': ['Twisted Web Client Example']}),
    None)


def cbResponse(failure):
    print(failure)


d.addBoth(cbResponse)


def cbShutdown(ignored):
    reactor.stop()

d.addBoth(cbShutdown)

reactor.run()

Output is:

python client.py 
[Failure instance: Traceback (failure with no frames): <class 'twisted.web._newclient.ResponseFailed'>: [<twisted.python.failure.Failure exceptions.AttributeError: 'TransportProxyProducer' object has no attribute 'loseConnection'>]
Searchable metadata
trac-id__8570 8570
type__enhancement enhancement
reporter__pawelmhm pawelmhm
priority__normal normal
milestone__None None
branch__ 
branch_author__ 
status__new new
resolution__None None
component__core core
keywords__http__Agent__webclient http, Agent, webclient
time__1467896302835926 1467896302835926
changetime__1511740897981113 1511740897981113
version__None None
owner__None None

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

The problem with accepting arbitrarily long headers is that you can still DoS a client by having it continue to read from a socket forever, never processing the response, never timing out.

This makes me inclined to "wontfix" this issue, but... I'm not entirely sure how other HTTP clients handle this. What do browsers do? cURL? wget? If other clients typically ignore headers over their max length like this then let's make the change.

@twisted-trac
Copy link
Author

pawelmhm's avatar @pawelmhm commented

What do browsers do? cURL? wget? If other clients typically ignore headers over their max length like this then let's make the change.

curl handles this without trimming headers (but my fish shell crashes immediately after request so I'm not entirely sure curl is not leaking anything, maybe it causes some segfaults).

Google-chrome reads full headers no problem.

wget refuses to read headers and retries printing following error:

--2016-08-30 08:44:23--  (try: 3)  http://localhost:8080/
Connecting to localhost (localhost)|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response... Read error (Cannot allocate memory) in headers.
Retrying.

I think wget approach is best here in that it prints instructive error about impossibility of handling too long headers. I'm not entirely sure it makes sense to retry though. wget probably assumes that those headers are a result of misconfigured server, this assumption might not be true.

@twisted-trac
Copy link
Author

jlitzingerdev's avatar @jlitzingerdev commented

I made some comments regarding this on a review for #9295, Firefox and Safari on iOS also handle the headers just fine.

While I agree that allowing arbitrarily long headers is a dangerous, and am not advocating for that by default, it does seem like better communication is preferable to a won't fix.

Rationale:

The decision to disallow headers after a certain length is a policy. Whether this policy is good or bad (fwiw I'd say it is good as it defaults to safe) is less important than the fact that it is a Twisted specific opinion. With any policy decision there are at least two options:

  1. Default to sane values and expose configuration to advanced users.
  2. Clearly communicate that the error is due to a policy decision.

To that end, I'd propose one or both of two paths:

  1. Ensure that the failure communicates that the header length was the problem, likely by handling lineLengthExceeded in one of the HTTP parsing classes.

  2. Allow users to configure an IAgent through a new interface method/init argument. I suggest the former as it acts as a better contract for IAgent implementations, especially if the configuration object has its own interface.

That said, #2 has implications outside the scope of this ticket and deserves its own ticket/design discussion, so I'd vote for option 1 to address this ticket. Behavior doesn't change, but at least a user knows why.

Thoughts?

-- Note that I'm willing to put the work in for option 1, I'm not just throwing it out there, but I'd prefer to know there was some agreement with the work.

@twisted-trac
Copy link
Author

adiroiban's avatar @adiroiban commented

I am happy to review option #11338.
For my part, a simple log before closing the connection should be enough.
This will help discover the cause of the error and then you can dig for your own fix :)

I remember that I look at passing a message via the exception handling, and it was not straightforward with the
current API.

LineReceiver will just close the connection without any message and the loseConnection does not accept an argument

https://github.com/twisted/twisted/blob/twisted-17.9.0/src/twisted/protocols/basic.py#L638

One way in which I was trying to fix it is to pass the reference to protocol to the HTTPClientParse so that the parser will set a whyConnectionWasLost attribute to the HTTP11ClientProtocol and then the protocol could show a reason when connection was closed.

I saw this pattern in some other places in Twisted, but I would have preferred to be able to transmit the reason just via an exception


For option #2.

If you want the IAgent way, I guess that for headers lenght you will have to go to Agent -> HTTPConnectonPool -> HTTPClientFactory -> HTTP11ClientProtocol -> HTTPClientParser

But just to get headers length, maybe is easier to pass the options via the request object https://github.com/twisted/twisted/blob/twisted-17.9.0/src/twisted/web/_newclient.py#L1430


If you have a real need for any of the above options, I am happy to review the work.

Otherwise I also think that this should be closed as won't fix.

If in the future someone while have a real need for this, a new ticket and PR can be created and we can review that work.

@adiroiban
Copy link
Member

adiroiban commented Jul 20, 2023

Just an update to this one.

With the recent Twisted version, the error is now:

[Failure instance: Traceback (failure with no frames): <class 'twisted.web._newclient.ResponseFailed'>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
]

Which is expected... when the line is too long, for now Twisted just silently closed the connection.

I think is best to keep this open and close it once we can have something like

twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Server returned a long header.

@glyph
Copy link
Member

glyph commented Jul 20, 2023

@adiroiban thanks for tracking this

@Gallaecio
Copy link
Contributor

In Scrapy we have an old issue about this which comes back now and again as people are affected while crawling specific websites.

In scrapy/scrapy#5911 I tried to address this with the public API, but things got too messy, and eventually went with monkey-patching.

Before we merge that change, we would like to know if you are open to either increasing the default maximum length (e.g from 2**14 to 2**16), or option 2 as discussed here, i.e. provide a public API to change this value.

@adiroiban
Copy link
Member

I don't know what to say about 2**16. Maybe 2**15 is a better default value.

At http://browsercookielimits.iain.guru/ I see the limit is 4KB per header

and the RFC https://datatracker.ietf.org/doc/html/rfc6265#section-6.1 I only see minimum limits.

@Gallaecio
Copy link
Contributor

Gallaecio commented Dec 20, 2023

Those limits are specifically about the Cookie header, not headers as a whole. It seems the Referer header also gets a limit. But you can load https://www.vapestore.co.uk/ fine on a web browser even though it has a header with 28k+ chars.

2**15 would be OK by me, it certainly would be enough to address the only currently known website affected (https://www.vapestore.co.uk/). I went with 2**16 because https://www.vapestore.co.uk/ was rather close to 2**15, but in the end it is an arbitrary decision, it could be that browsers do not limit headers (beside special ones).

@adiroiban
Copy link
Member

I expect that browser have some limits... they just might be very big ... otherwise you can end up with out of memory errors.

Here are the current limits

  • 500 total header count
  • 16384 bit for all headers - this can be increased

maxHeaders = 500
totalHeadersSize = 16384

we also have maximum line length , that can be increased for HTTP.

MAX_LENGTH = 16384

Right now, both server and client HTTP part share the same HTTP headers parsing code.


I think that we can increase the limits.

I would go with a conservative value of

  • 2**15 for a single line/header
  • 2**16 for total headers

What do you think ?

@Gallaecio
Copy link
Contributor

Sounds good to me.

@adiroiban
Copy link
Member

Adrian, if you have the time for a PR, I am happy to review it.
I don't think I will have much time in the near future to work on this.
Regards

@Gallaecio Gallaecio linked a pull request Jan 26, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants