-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contact points randomization #1029
Comments
Also / similar but separate issue: all shards (but 0) are also contacted in the same order. This causes a small storm when a node comes back up. I understand we have to contact shard 0 first, but the rest should be in a random order. |
Why do we have to contact shard 0 first? |
Because we only know that nodes contain at least shard 0. No other shard's presence is guaranteed. |
The driver does not choose a shard when establishing the first connection. Usually, the first connection should be established to the non-shard-aware port and Scylla will choose the shard that is least loaded with connections. After the first connection is made and the driver learns how many shards the node has, it will start connecting to all other shards at once (that's how it works right now). |
Indeed.
True.
Yes. And it may cause (as we've seen in the past) a connection storm. As all drivers now seeing a new node is up will do the same. Ideally, it should randomize and pace the connections to all other shards. |
Problem
When driver is given an ordered lists of initial contact points, control connection is attempted to be open to nodes in that order. If the cluster is operating normally, the first node always accepts the control connection and becomes burdened with it (in a way of having to send events when triggered and topology & schema metadata when queried) until either the connected driver disconnects or the node breaks down. This imbalance should be avoided.
Solution
cpp-driver by default enables random shuffling of the initial contact points list. This ensures proper load balancing over nodes wrt control connection. We should do the same, and, similarly to cpp-driver, expose a config option to disable this behaviour (mainly useful for deterministic testing).
The text was updated successfully, but these errors were encountered: