New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection test occasionally causes TLS handshake failed errors #2081
Comments
How are you connecting to the Proxy in the container? |
FWIW I do this:
And then connect with psql from my host machine. This works without issue. |
@enocom - I have no issue connecting my host machine to the proxy; I have an intermittent issue w/ the proxy connecting to cloud sql. However, I have no issue with the proxy connecting to cloud sql when using v2.7.2. The only thing I noticed was a timeout error in v2.8.1. |
How often do you see this timeout error? Also I assume you're connecting to public IP? |
The timeout error has been happening ~50% of the time when running a workload that uses v2.8.1. |
Thanks, @mvhatch. In that case, let me hammer this some more to see if I can force out the same problem. Have you tried reproducing outside of docker? |
I haven't reproduced this yet, but realize I've seen this same problem when people use the readiness probe. The key error is:
|
Idk if this is helpful, but I've also seen a timeout error in addition to the context deadline exceeded error:
|
perhaps it is a red herring, but do you know the reason the connection tests now take 30s in version |
Yes that is helpful. It implies the remote end of the socket isn't responding, whether for the initial connection or the TLS handshake. |
That's still unclear and entirely unexpected. |
FWIW I'd expect the connection test to be only used when you're uncertain if the network path is open. Otherwise, I wouldn't use it regularly. In the meantime, we'll sort out this bug. |
I have a similar issue when running in dataflow. I am connecting using private ip. It's not very often, something like ~5%, but it causes my dataflow job to fail after spending an hour hung up on this.
|
@akshetpandey Are you using the If you're getting TLS errors otherwise, feel free to open a new issue. At a glance, I'd be curious to know what kind of CPU usage you see when that TLS handshake timeout occurs. Sometimes that can be a symptom of the Proxy being resource starved. |
It isn't related to that flag, although I should use it to work around the infinite loop I am facing because of this! Will create a new issue |
I think this is a duplicate of #2224 and #2212, the first of the two being the more relevant description. We're going to cut a release next week with a fix. If I'm mistaken and this is still happening after the release next week, feel free to re-open. You can try this now by building against the latest cloudsqlconn (https://github.com/GoogleCloudPlatform/cloud-sql-go-connector/releases/tag/v1.10.1). |
Bug Description
gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0
andgcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.1
takes 30s to establish a connection when using--run-connection-test
. This delay intermittently introduces proxy connection errors.Stacktrace
Version
v2.8.1
Version
v2.7.2
Steps to reproduce?
Start a proxy container pointing to an instance, run the connection test as part of startup
docker run --rm gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0 <project>:<region>:<instance> --run-connection-test
Environment
./cloud-sql-proxy --version
):2.8.1+container
./cloud-sql-proxy --port 5432 INSTANCE_CONNECTION_NAME
):Additional Details
No response
The text was updated successfully, but these errors were encountered: