Queries failing due to `no hosts available in the pool`. #325

ouamer-dahmani · 2024-10-31T19:49:33Z

Hello,

I am encountering issues where queries are not being retried despite a retry policy being configured when creating a new Cluster object.

Reads and writes work fine but then at some point we get errors on some of them: gocql: no hosts available in the pool.
Delving in the code I see that it should indeed retry the queries (I forced a query execution error in the debugger).

I then added logging to the cluster:

cluster.Logger = logger
cluster.QueryObserver = logger
cluster.BatchObserver = logger
cluster.ConnectObserver = logger

The logger gets called for queries that succeed but never for those that fail. I wonder if it is because the queries are not even ran once due to no hosts being in the connection pool?

I sometimes see connection events before the failures (can be a few milliseconds or minutes) but that is not always the case and they are not error logs either.
Connect: Dial Duration: 5.383348ms, Host: 10.173.92.242

I know that the network on my kubernetes cluster is a bit flaky sometimes but I assume this should be taken care of gracefully with reconnections on the connection pool and retries on the queries.

I am running version v1.13.0 of the driver.
I see that v1.14.X have changes around connections but am unsure they are related to the issues I am having and have held off on updating due to lack of time to test it out.

The text was updated successfully, but these errors were encountered:

dkropachev · 2024-10-31T20:29:02Z

Could you please provide your ClusterConfig including HostSelectionPolicy and retry policy.

ouamer-dahmani · 2024-10-31T20:36:55Z

Hello!

It is equivalent to the following. I used high values to see if it would help pass through the potential instability.

cluster := gocql.NewCluster(cfg.Hosts...)
cluster.Keyspace = cfg.Keyspace
cluster.Timeout = 5 * time.Second
cluster.RetryPolicy = &gocql.ExponentialBackoffRetryPolicy{
	Min:        500 * time.Millisecond,
	Max:        5 * time.Second,
	NumRetries: 5,
}
cluster.Consistency = gocql.LocalQuorum
cluster.Authenticator = cfg.Authenticator
cluster.PoolConfig.HostSelectionPolicy = gocql.RoundRobinHostPolicy()
cluster.DisableInitialHostLookup = false
cluster.DisableShardAwarePort = true

dkropachev · 2024-10-31T23:19:26Z

@ouamer-dahmani , what most likely happens is this:

Due to the unstable connection driver looses connections to all nodes at some point.
When it happens executor does not even get to retry policy, it just iterates over hosts provided by RoundRobinHostPolicy to find one that has connections to it and could be used to execute query. Since it finds no such hosts, it end up returning &Iter{err: ErrNoConnections}

It works the same way on modern version as well, so you can't fix it by upgrading the driver.
I would suggest to manually retry on this error, until we fix retry logic

I am closing this issue in favor of #326.
But feel free to continue discussion here if it is related to given case.

dkropachev closed this as completed Oct 31, 2024

dkropachev self-assigned this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries failing due to `no hosts available in the pool`. #325

Queries failing due to `no hosts available in the pool`. #325

ouamer-dahmani commented Oct 31, 2024

dkropachev commented Oct 31, 2024

ouamer-dahmani commented Oct 31, 2024

dkropachev commented Oct 31, 2024

Queries failing due to no hosts available in the pool. #325

Queries failing due to no hosts available in the pool. #325

Comments

ouamer-dahmani commented Oct 31, 2024

dkropachev commented Oct 31, 2024

ouamer-dahmani commented Oct 31, 2024

dkropachev commented Oct 31, 2024

Queries failing due to `no hosts available in the pool`. #325

Queries failing due to `no hosts available in the pool`. #325