You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 27, 2021. It is now read-only.
currently it’s looking like we want a 6s timeout across the board
but we don’t know why 65 data channels are created and why there are 72 timeouts logged before the 6s timeout occurs
Concrete changes to make
delete maxElapsedBackoffMs - it’s irrelevant for our Use Case (double check with Adam about his as it’s still in his doc)
upgrade the Bigtable java lib from 1.12.1 to 1.18.1 and check that we see 2x retries before the 6s timeout (see Slack thread with Adam / testing Doc). We know we need this as it resolves 2 or 3 retry-related bugs.
fine-tune all our *TimeoutMs and *TimeoutMillis settings
The text was updated successfully, but these errors were encountered:
sming
changed the title
Implement changes from @peterk & Adam’s analysis of Bigtable timeouts & retries
Implement changes from @peterk & Adam’s analysis of Bigtable timeouts & retries (e.g. bump java Bigtable lib to v1.18.2)
Jan 24, 2021
sming
changed the title
Implement changes from @peterk & Adam’s analysis of Bigtable timeouts & retries (e.g. bump java Bigtable lib to v1.18.2)
Analyse Heroic and user's perspective when hitting a timeout. Then implement necessary changes.
Feb 9, 2021
we finally know how API clients will experience the BT Timeout :
they’ll get a 200. Which is … interesting. And misleading IMO.
they get no results. Well, at least for the query I was running.
they get the following error message in the response body in errors[0].error :
"Some fetches failed (100) or were cancelled (870), caused by Some fetches failed (100) or were cancelled (870)"
we (Prism) need to decide if the above is acceptable. @malish8632 's argument is that this is the current behaviour and always has been, hence no changes are necessary.
My argument is that many more users who've never received a timeout before will now receive one, in the shape of a 200, which is super misleading cos it actually failed, in effect.
Hence the agreed-upon action is to phrase the 1, 2, 3 above as a "reminder" of how heroic returns requests that time out.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Implement Findings
Use Case Resolved: unknown Bigtable timeout & retry behaviour
Design & Implementation Notes
Concrete changes to make
The text was updated successfully, but these errors were encountered: