Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query retry logic is not triggered in certain cases #326

Closed
dkropachev opened this issue Oct 31, 2024 · 4 comments · Fixed by #376
Closed

Query retry logic is not triggered in certain cases #326

dkropachev opened this issue Oct 31, 2024 · 4 comments · Fixed by #376
Assignees

Comments

@dkropachev
Copy link
Collaborator

Query executor does not involve retry policy in certain cases

gocql/query_executor.go

Lines 109 to 198 in fc1b783

func (q *queryExecutor) do(ctx context.Context, qry ExecutableQuery, hostIter NextHost) *Iter {
selectedHost := hostIter()
rt := qry.retryPolicy()
lwt_rt, use_lwt_rt := rt.(LWTRetryPolicy)
// We only want to apply LWT policy to LWT queries
use_lwt_rt = use_lwt_rt && qry.IsLWT()
var lastErr error
var iter *Iter
for selectedHost != nil {
host := selectedHost.Info()
if host == nil || !host.IsUp() {
selectedHost = hostIter()
continue
}
pool, ok := q.pool.getPool(host)
if !ok {
selectedHost = hostIter()
continue
}
conn := pool.Pick(selectedHost.Token(), qry)
if conn == nil {
selectedHost = hostIter()
continue
}
iter = q.attemptQuery(ctx, qry, conn)
iter.host = selectedHost.Info()
// Update host
switch iter.err {
case context.Canceled, context.DeadlineExceeded, ErrNotFound:
// those errors represents logical errors, they should not count
// toward removing a node from the pool
selectedHost.Mark(nil)
return iter
default:
selectedHost.Mark(iter.err)
}
// Exit if the query was successful
// or no retry policy defined
if iter.err == nil || rt == nil {
return iter
}
// or retry policy decides to not retry anymore
if use_lwt_rt {
if !lwt_rt.AttemptLWT(qry) {
return iter
}
} else {
if !rt.Attempt(qry) {
return iter
}
}
lastErr = iter.err
var retry_type RetryType
if use_lwt_rt {
retry_type = lwt_rt.GetRetryTypeLWT(iter.err)
} else {
retry_type = rt.GetRetryType(iter.err)
}
// If query is unsuccessful, check the error with RetryPolicy to retry
switch retry_type {
case Retry:
// retry on the same host
continue
case Rethrow, Ignore:
return iter
case RetryNextHost:
// retry on the next host
selectedHost = hostIter()
continue
default:
// Undefined? Return nil and error, this will panic in the requester
return &Iter{err: ErrUnknownRetryType}
}
}
if lastErr != nil {
return &Iter{err: lastErr}
}
return &Iter{err: ErrNoConnections}
}

These cases are

  1. Host is down
  2. Host does not have pool
  3. Host pool does not have connections

Probably it make sense to involve retry policy in such cases.
It will require policies to be smarter in regards of the GetRetryType results, in listed cases.
Also it will break API a bit in a sense that we will have to introduce new errors which will be returned instead of ErrNoConnections

@sylwiaszunejko
Copy link
Collaborator

sylwiaszunejko commented Dec 16, 2024

@dkropachev Currently in these cases you mentioned we don't use retry policy, but we retry on the next host always. It is true even if retry policy is not defined. I am not sure what correct behavior should be if one of those cases occur:

  1. Host is down
  2. Host does not have pool
  3. Host pool does not have connections

and rt == nil, should we then stick to the current behavior and retry on the next host or should we just exit?

@sylwiaszunejko sylwiaszunejko self-assigned this Dec 16, 2024
@sylwiaszunejko
Copy link
Collaborator

@dkropachev ping

@pdbossman
Copy link

I would think stopping the retry on next host would be a regression. It seems to me - the request was not really tried.

@sylwiaszunejko
Copy link
Collaborator

I would think stopping the retry on next host would be a regression. It seems to me - the request was not really tried.

Ok, so I guess if retry policy is not there we should retry on next host in those cases. Also I think the retry policies should be edited to make sure they always result in retry on next host for given scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants