-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for incompatibility with HTTPS Upgrade #1668
base: master
Are you sure you want to change the base?
Conversation
…ntroduced in Chromium browsers. - add terminate-on-secure-connect-fail http_port option.
Can one of the admins verify this patch? |
FTR;
So, to do this right without breaking a lot of traffic beyond Chrome we need to be a lot more specific about which action is taken and when. Checking details from whether we have a |
src/cf.data.pre
Outdated
any header and body. | ||
This enables the affected browsers to fall back to HTTP instead of | ||
showing the error page generated by Squid. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on addressing this annoying problem!
Amos: we need to be a lot more specific about which action is taken and when. Checking details from whether we have a
CONNECT
(by whichUser-Agent
, with whichUpgrade:
negotiation, ...), to what TLS error occured (its TLS alert type, by which downstream hop generated it, ...), and whether we have alternate destinations remaining to try.
I agree that a different fix is needed. I also agree that bypassing the "retry" part of retryOrBail() logic is wrong. I disagree that Squid should (automatically) check User-Agent, Upgrade, TLS error, etc. to (automatically) enable the desired behavior.
Instead, I suggest adding an on_ssl_bump_error directive, similar to the existing on_unsupported_protocol directive. The directive should tell Squid whether to close the connection, respond with a CONNECT error, or bump and respond with a CONNECT error (at least). The initial implementation does not need to support all these actions if just closing connections (without bumping the client and without sending an error response) is enough to support "try HTTPS first" problem this PR focuses on. Two actions (default and new) would be enough.
Squid ACLs can be used to make the choice conditional. If there are not enough ACL types, new ones can be added, of course.
If the problem is not specific to SslBump deployments, then the new directive should not be SslBump-specific either, of course. I have only heard about this problem in SslBump context, but that does not mean the problem is specific to SslBump.
Amos: 1. Squid current behaviour of doing a TLS "success" and presenting 5xx HTTP error page - is to avoid bugs in Chrome and Firefox handling of non-
HTTP/1.1 200 OK
responses to CONNECT tunnel attempts.
Just to clarify: On many SslBump errors, Squid bumps TLS client connection and waits for the first HTTP request on it to send the error response because some popular browsers do not show CONNECT error responses to users. AFAIK from HTTP WG discussions, browser folks do not consider this behavior to be a browser bug; they just do not want to spend (non-trivial) resources on properly relaying the error to the user. Thus, this aspect of browser behavior is unlikely to change in the foreseeable future.
The corresponding Squid behavior breaks "try HTTPS first" feature in recent Chrome releases (when the user types in the domain name without the URI scheme), as documented in this PR description. This PR idea to make that behavior conditional is correct, but PR implementation needs to be refactored as discussed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My explanation in this PR might have an unclear point, so I leave a note first here just in case:
Squid returns "HTTP/1.1 200" to the client for CONNECT to affected sites even if this draft option is set.
After receiving the actual request (the first HTTP request as you mention) from the client, Squid without the option returns the 503 error page (ERR_SECURE_CONNECT_FAIL/ERR_CONNECT_FAIL), and Squid with the option set returns nothing (terminates TCP with no more header/body).
The browser shows ERR_EMPTY_RESPONSE (or ERR_CONNECTION_RESET if the commit c575e8d is applied) message for the latter case (if intentionally tried to access the URL of wrong HTTPS, not by HTTPS Upgrade).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, you comment matches my overall understanding of this PR. The only difference I am aware of is that Squid does not close the connection to the browser in my tests and "terminates TCP" in yours. We are discussing that aspect in another change request.
src/FwdState.cc
Outdated
entry->abort(); | ||
complete(); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing this PR a little, I would like to add the following to my blocking review: The current PR code does result in "HTTPS to HTTP" downgrade in Chrome, as intended, but the desired Chrome behavior appears to be triggered by a timeout in Chrome (i.e. by Squid inaction): In my quick-and-dirty tests, the original CONNECT tunnels opened by Chrome remain open when Chrome (after a ~6 second pause) sends a plain text request to Squid. If that observation is not a side effect of my poor testing, then I hope Squid behavior can be changed when PR code is refactored to address earlier review concerns: If it all possible, we want Squid to close doomed CONNECT tunnels; we do not want these tunnels to wait for browser timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is actually causing the difference though, the browser does not seem to (generally) wait the timeout on my environment.
I am aware of that the timeout is one of the triggers of HTTPS to HTTP downgrade. If the upstream HTTPS port is unreachable (TCP SYN is silently dropped), Squid wait TCP ACK for a certain amount of time then that perhaps results in the same behaviour of browser like you observe. Does it match to your test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: if waiting timeout generally occurs depending on the platform, sending TCP RST (setsockopt) at the disconnection may be an idea to try. I prototyped before posting this PR but did not include it there because I did not find any benefit on my environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is actually causing the difference though, the browser does not seem to (generally) wait the timeout on my environment.
What signal/information do you think your browser is acting on if it is not the timeout? AFAICT, Squid sends nothing (but a TCP ACK packet) to the browser after responding to CONNECT... In your tests, does Squid close the browser-to-Squid connection before browser downgrades? If yes, I can think of several explanations for that difference, including these three:
- a bad side effect of my quick-and-dirty tests; this is the most likely reason
- a side effect of a different ssl_bump configuration (I tested with
ssl_bump stare all
) or some other valid/reasonable but different configuration options - a side effect that you have not tested master-based code (because it did not compile until 297a8d0)
If the upstream HTTPS port is unreachable (TCP SYN is silently dropped), Squid wait TCP ACK for a certain amount of time then that perhaps results in the same behaviour of browser like you observe. Does it match to your test case?
Please note that what happens between Squid and the origin server is irrelevant to my concern. Said that, IIRC, in my test, Squid "immediately" receives a RST packet because nobody is listening on the HTTPS port of the test origin server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What signal/information do you think your browser is acting on if it is not the timeout? AFAICT, Squid sends nothing (but a TCP ACK packet) to the browser after responding to CONNECT... In your tests, does Squid close the browser-to-Squid connection before browser downgrades?
Edge: CONNECT tcp session is immediately (of course with slight delay to access the origin server) disconnected by FIN, and GET follows it.
Chrome: needed to push the security warning button before actually access to the downgraded HTTP, so there is a little interval between CONNECT and GET, but you will see CONNECT tcp session is immediately disconnected by FIN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My test environment is also dirty.
I'm using the version of v6.6 with KTLS patch (and some more personal patch like increasing buffer size)..
If the difference persists, we have to narrow down the cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the difference persists, we have to narrow down the cause.
Agreed. I am not particularly worried about kept-open browser-Squid connections at this time as long as we agree that they should not be kept open. We will retest when other (primary) concerns are addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the one.
a side effect of a different ssl_bump configuration (I tested with ssl_bump stare all) or some other valid/reasonable but different configuration options
I had been using ssl_bump bump all.
Changing it to ssl_bump stare all reproduces the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the code to proactively terminate (reset) tcp as of 1943d24.
# Conflicts: # src/cache_cf.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your continued effort to address this and similar issues with SslBump error handling! I (still) support the addition of on_ssl_bump_error or similar direcitve. Your changes have significantly improved this PR. Thank you! I wish I could find more time to devote to this issue.
There are many secondary problems in the proposed new code, but we should ignore them until the following primary issues are addressed:
1. on_ssl_bump_error should not apply to bumped tunnels
The new feature should only apply to transactions where the from-client connection has not been bumped or spliced yet. Once the connections are spliced or (at least) the client is bumped, the new ..._ssl_bump_...
directive should not be consulted because SslBump scope has ended (at least for that spliced or bumped client). In other words, on_ssl_bump_error directive is not an on_error directive!
To address this concern, we could detect the right from-client connection SslBump state in FwdState::bumpOnError(), but that approach is flawed because FwdState is not the code that determines what happens to the client: FwdState (and friends) generate errors when things go wrong while talking to the server, but some errors may happen before FwdState gets control, and the (unwanted) reaction to all errors does not actually happen in FwdState! The decision to bump the client and serve the error response to that client happens on the other side (I do not recall the exact code, but you will find it if you follow debugging in the problematic cases; looking for "Force bump action on error" may also help). That is the code where the new directive should be consulted.
2. on_ssl_bump_error should apply to ssl_bump errors encountered when talking to the client
This application is in addition to reacting to errors encountered when talking to the origin server, of course. It is possible that you will "automatically" add this support while addressing the first concern above. You can stop reading this section if you do. Otherwise, please keep reading.
The initial implementation does not have to support both sides, but, if it does not support both sides, its documentation must explicitly state this temporary limitation while making sure that we can enhance support in the future without renaming the directive, adding a new one, or warning the admins.
This concern can be addressed by adjusting cf.data.pre text AFAICT. We need to be very clear about this new directive (intended) scope, so that reviewers know what feature they are approving even if they do not review implementation details. I will help polish cf.data.pre text, but please make the initial edits, so that I do not forget to disclose this important limitation/plans when polishing.
Hi, @rousskov I have not wholly understood your comment yet (with a concrete image of codes), and it may take time for me, so for the meanwhile, I have added some notes regarding ERR_ZERO_SIZE_OBJECT in this PR description that I did not realize when initially submitting this PR. Necessity of supporting ERR_ZERO_SIZE_OBJECT (which occurs after successful CONNECT, and if I’m understanding the terms correctly, in the bumped tunnel) might affect the appropriate directive (name) and/or code changes. |
Needless to say, please feel free to ask questions, especially if something I wrote is not clear or appears to be self-contradictory.
Bugs notwithstanding, ERR_ZERO_SIZE_OBJECT error may only occur after active SslBump processing is finished. In other words, it should have no effect on SslBump code in general and the new directive in particular. |
So, reading your comment, my current understanding is that your suggested directive “on_ssl_bump_error” or the term SslBump processing seems to have targeted only narrower phase of session establishment than I initially imagined. As previously commented, in addition to ERR_CONNECT_FAIL and ERR_SECURE_CONNECT_FAIL, ERR_ZERO_SIZE_OBJECT has newly been found as a target to work around the HTTPS Upgrade issue. At the same time, I do not think ERR_ZERO_SIZE_OBJECT during non-ssl-bumped session (I hope this term is ok..) should be targeted. Supposing the proper directive “on_ssl_bump_error” is not expected to apply to bumped tunnels, and if that means the directive does not cover the ERR_ZERO_SIZE_OBJECT in question, since this PR’s main objective is making a workaround for the HTTPS Upgrade issue, I guess we should first rename the directive (and relative function names etc.) to some appropriate one that covers bumped tunnels then discuss how we can improve the implementation. Please let me know if my observation is wrong. |
Alleged facts:
There are several ways to reconcile the two statements (at the specs level), but I will focus on the most viable candidate:
To reduce the number of code rewrites, I would not rush into changing code at this point. Let's agree on the specs first. That agreement should be reflected in cf.data.pre. If you do not want to break the build, you can keep the
Yes, but I would move "(and relative function names etc.)" from your step one into this second step. Let's agree on basic/high-level specs first (and give others an opportunity to review the version we are happy with). Do you agree with the new directive scope (i.e. the |
I have no specific objection to the scope so updated cf.data.pre. Although I kept the option reset (RST) now, it does not seem to be mandatory to work around the HTTPS Upgrade issue.
To check if "annotate_client bumped=true" works, I have changed a part of the code and some function names along the way. |
Regarding the specific ERR_ZERO_SIZE_OBJECT error case (tempesta-tech.com/wp-content/uploads/2022/09/5_tempesta_tls.pdf), another factor may be affecting the result. Unlike other (HTTPS-Upgrade) error cases I’m aware of, direct access (or access via Squid splice) to the https url works normally, and pdf is shown correctly. The cause is yet unknown. |
Looks like this is not a Squid (only) problem.
The same result on FreeBSD 13.3, 12.4, and Ubuntu2310. Google's search result url (not https but http) reinforces it. |
So, this is technically a separate issue from this PR. |
-- The description below is obsolete; on_ssl_bump_error directive is to be changed to on_error. --
Background
Recent Chromium browsers introduced the HTTPS Upgrade (Automatic HTTPS) function by default:
https://blog.chromium.org/2023/08/towards-https-by-default.html
HTTPS Upgrade:
The browsers (may) first try to connect to HTTPS port even if the user directed to connect to HTTP port.
Issue
If an accessing HTTP site has both HTTP/HTTPS ports opened while an inappropriate certificate is set to the HTTPS port, the browsers with HTTPS Upgrade function first try to connect to HTTPS port and fall back to the HTTP port if directly connecting to the site, however, the browsers only show the ERR_SECURE_CONNECT_FAIL page if connecting to the site via Squid with SSL Bump.
A similar result (ERR_CONNECT_FAIL) may occur for HTTP sites whose HTTPS port is not open.
ERR_ZERO_SIZE_OBJECT is another known case where the origin HTTPS port does not reply anything after successful CONNECT.
This PR
introduces an option by on_ssl_bump_error directive (maybe as a transitory measure until the fallback algorithm of the browsers will be more sophisticated or standardized).
With the option set for ERR_CONNECT_FAIL and ERR_SECURE_CONNECT_FAIL, if CONNECT (including certificate validation) to a site failed, Squid will terminate (or reset) the client session without sending any header and body. This enables the affected browsers to fall back to HTTP as they do so for direct connection, instead of just showing the error page generated by Squid.
To be precise, Squid returns the fake "HTTP/1.1 200" to the client for CONNECT to affected sites even if this option is set. The option differentiates the response after the first HTTP request from the client is received.
Similarly, if the option is set for ERR_ZERO_SIZE_OBJECT, Squid will terminate the client session when it does not receive any data from the connected origin port.
Note
Chrome has a setting option "Always use secure connections" and turning it on is required to reproduce this issue. In other words, turning it off is an alternative workaround at client side.
Chromium Edge has a slimilar switch but needs a registry modification: HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Edge
HttpsUpgradesEnabled
Miscellaneous references