Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scylla::transport::topology ... Could not fetch metadata ... Protocol Error: system.peers or system.local has invalid column type #1044

Open
vponomaryov opened this issue Jul 29, 2024 · 37 comments
Assignees
Labels
duplicate This issue or pull request already exists

Comments

@vponomaryov
Copy link

vponomaryov commented Jul 29, 2024

scylla rust driver version: 0.13.0
scylla version: 2024.2.0~rc1-20240715.9bbb8c1483d7

Using latte stress tool we get following errors:

2745.003        6351       25404       0.243     0.535     0.659     0.859     1.187     2.099     2.441     2.771     4.829
2750.005        6350       25398       0.260     0.536     0.659     0.857     1.141     1.888     2.410     2.820     3.697
2755.005        6350       25400       0.267     0.558     0.692     0.920     1.313     2.136     2.523     3.396    10.486
�[2m2024-07-28T08:53:36.013626Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.59:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
�[2m2024-07-28T08:53:36.018797Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.159:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
�[2m2024-07-28T08:53:36.025609Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.142:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
�[2m2024-07-28T08:53:36.030338Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.125:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
�[2m2024-07-28T08:53:36.035767Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.154:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
�[2m2024-07-28T08:53:36.041411Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.121:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
2760.001        6350       25400       0.275     0.575     0.712     0.940     1.351     2.169     2.460     2.793     6.042
�[2m2024-07-28T08:53:36.048046Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.131:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
�[2m2024-07-28T08:53:36.055162Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to establish control connection and fetch metadata on all known peers. Falling back to initial contact points.
...
�[2m2024-07-28T09:13:28.302729Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Failed to fetch metadata using current control connection �[3mcontrol_connection_address�[0m�[2m=�[0m"10.142.0.159:9042" �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
�[2m2024-07-28T09:13:28.305729Z�[0m �[31mERROR�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Could not fetch metadata �[3merror�[0m�[2m=�[0mProtocol Error: system.peers or system.local has invalid column type
3952.774        6342       25367       0.259     0.577     0.693     0.852     1.031     1.137     1.328     1.516     2.378

It was not observed with previous versions of Scylla.

Ci job: https://jenkins.scylladb.com/job/enterprise-2024.2/job/longevity/job/longevity-gce-custom-d1-worklod2-hybrid-raid-test/2/
Argus: enterprise-2024.2/longevity/longevity-gce-custom-d1-worklod2-hybrid-raid-test#2

@mykaul
Copy link
Contributor

mykaul commented Jul 29, 2024

Dup of #1021 ?

@wprzytula wprzytula added the duplicate This issue or pull request already exists label Jul 29, 2024
@vponomaryov
Copy link
Author

Dup of #1021 ?

Probably yes.

Testing 2024.2 we get it a lot of times -> ~20k times in 1 hour.

@Lorak-mmk
Copy link
Collaborator

Dup of #1021 ?

Actually a duplicate of #1023 , closing as such

@Lorak-mmk Lorak-mmk closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2024
@fruch fruch reopened this Jul 29, 2024
@fruch
Copy link

fruch commented Jul 29, 2024

@Lorak-mmk

I'm reopening this one since it's a bit different from just a warning

�[2m2024-07-28T09:14:31.177595Z�[0m �[33m WARN�[0m �[2mscylla::transport::topology�[0m�[2m:�[0m Initial metadata read failed, proceeding with metadata consisting only of the initial peer list and dummy tokens. This might result in suboptimal performance and schema information not being available. �[3merror�[0m�[2m=�[0mProtocolError("system.peers or system.local has invalid column type")
info: Loading workload script /scylla-qa-internal/custom_d1/workload2/latte/custom_d1_workload2.rn...
�[2m2024-07-28T09:14:31.180326Z�[0m �[33m WARN�[0m �[2mscylla::transport::load_balancing::default�[0m�[2m:�[0m Datacenter specified as the preferred one (us-east1-s1) does not exist!
info: Connecting to ["10.142.0.59", "10.142.0.121", "10.142.0.125", "10.142.0.131", "10.142.0.142", "10.142.0.154", "10.142.0.159", "10.142.0.160", "10.142.0.5"]... 
�[2m2024-07-28T09:14:31.180348Z�[0m �[33m WARN�[0m �[2mscylla::transport::load_balancing::default�[0m�[2m:�[0m Datacenter specified as the preferred one (us-east1-s1) does not exist!
error: Cassandra error: Failed to execute query "SELECT cluster_name, release_version FROM system.local" with params []: Protocol Error: Empty query plan - driver bug!
�[2m2024-07-28T09:14:31.180354Z�[0m �[31mERROR�[0m �[2mscylla::transport::load_balancing::plan�[0m�[2m:�[0m Load balancing policy returned an empty plan! The query cannot be executed. Routing info: RoutingInfo { consistency: LocalQuorum, serial_consistency: Some(LocalSerial), token: None, table: None, is_confirmed_lwt: false }

In this case it's a multi DC case, we get into cases the stress command is failing straight away cause of this condition

We are trying to move more cases into tools using the rust driver, this kind of thing is holding us back.

Regardless of the issue with system.peers, the fact it doesn't fall back to any node, is something for sure different then other drivers (at least from the java behavior we know in c-s)

@wprzytula
Copy link
Collaborator

Regardless of the issue with system.peers, the fact it doesn't fall back to any node, is something for sure different then other drivers (at least from the java behavior we know in c-s)

This seems to be caused by a strict rule in default load balancing policy that if datacenter failover is not permitted (and it may be forbidden due to various causes, such as a local consistency being used, or being manually turned off), the query will never be routed to any node that does not belong to the preferred datacenter. As the initial contact points have their DC unspecified, they are rejected by the load balancer and hence the query plan is empty.

@wprzytula
Copy link
Collaborator

This issue indeed highlights an important problem: if the initial metadata fetch fails for any reason, then due to strict datacenter requirements the default load balancer may yield the driver inoperable.
@Lorak-mmk

@Lorak-mmk
Copy link
Collaborator

This issue indeed highlights an important problem: if the initial metadata fetch fails for any reason, then due to strict datacenter requirements the default load balancer may yield the driver inoperable. @Lorak-mmk

How is that an issue? If user requested specific DC, and disabled failover, and we are unable to contact this DC, then I don't see what else should the driver do but fail.

@wprzytula
Copy link
Collaborator

wprzytula commented Jul 30, 2024

How is that an issue? If user requested specific DC, and disabled failover, and we are unable to contact this DC, then I don't see what else should the driver do but fail.

Note that we may be able to contact the DC and issue queries, but due to some reasons the initial metadata fetch might fail. Then we would never learn which contact points belong to the preferred DC, and thus we would end up with empty query plans.

@Lorak-mmk
Copy link
Collaborator

If by "contact DC" you mean having network connectivity to DC, then sure, driver may have that, but it's not helpful.
Without fetching schema driver can't learn which peers are in which DC, and thus can't route queries to specified DC.
Again, I don't see what should the driver do. If we routed queries to random nodes despite preferred DC being set AND DC failover being disabled, it would be a bug.

@wprzytula
Copy link
Collaborator

What we could do is:

  1. add a way to provide DC along with the address of a contact point (preferred by me),
  2. add some fallback minimal kind of metadata refresh, e.g. only fetching peers from system.{local,peers}.

@Lorak-mmk
Copy link
Collaborator

What we could do is:

1. add a way to provide DC along with the address of a contact point (preferred by me),

Users would need to use this, and I suspect they won't. It also introduces more cases (what if DC fetched from schema is different than provided one?).

2. add some fallback minimal kind of metadata refresh, e.g. only fetching peers from `system.{local,peers}`.

I like the idea, but note that it would not help with this issue - because fetching peers failed.
I think that instead of adding new things, we can make fetch not fail if one part of it fails (e.g. if schema fetching fails, but peers fetching succeeds, then we can return ClusterData with correct peers and empty schema).

@fruch
Copy link

fruch commented Jul 30, 2024

This issue indeed highlights an important problem: if the initial metadata fetch fails for any reason, then due to strict datacenter requirements the default load balancer may yield the driver inoperable. @Lorak-mmk

How is that an issue? If user requested specific DC, and disabled failover, and we are unable to contact this DC, then I don't see what else should the driver do but fail.

sorry but the use said specifically to enable failover:

policy_builder = policy_builder.prefer_datacenter(dc.to_owned()).permit_dc_failover(true);

see

scylladb/latte@52db0cd#diff-d6346fd7d17270b1282142aeeda9c4bc2b7d8fd0f37b24a1c871a9257f0ed0aaR67

@fruch
Copy link

fruch commented Jul 30, 2024

also I think the missing part, is also that in a few seconds the data would be filled by scylla, i.e. on the next refresh.

as other driver doesn't fails, and keeps on reading the peers table until correct data appear there.

I think it already agreed that rust driver is a bit too happy trigger to fail on missing data in peers tables compared to java/go/python.

@wprzytula
Copy link
Collaborator

wprzytula commented Jul 30, 2024

sorry but the use said specifically to enable failover:

policy_builder = policy_builder.prefer_datacenter(dc.to_owned()).permit_dc_failover(true);

see

scylladb/latte@52db0cd#diff-d6346fd7d17270b1282142aeeda9c4bc2b7d8fd0f37b24a1c871a9257f0ed0aaR67

OK, but failover can be also disabled by using local consistency. See code of the default load balancer:

fn is_datacenter_failover_possible(&self, routing_info: &ProcessedRoutingInfo) -> bool {
    self.preferences.datacenter().is_some()
        && self.permit_dc_failover
        && !routing_info.local_consistency
}

whereas local consistency is defined as:

let local_consistency = matches!(
    (query.consistency, query.serial_consistency),
    (Consistency::LocalQuorum, _)
        | (Consistency::LocalOne, _)
        | (_, Some(SerialConsistency::LocalSerial))
);

@Lorak-mmk
Copy link
Collaborator

also I think the missing part, is also that in a few seconds the data would be filled by scylla, i.e. on the next refresh.

as other driver doesn't fails, and keeps on reading the peers table until correct data appear there.

I'm pretty sure that Rust Driver will try to re-read schema too.

I think it already agreed that rust driver is a bit too happy trigger to fail on missing data in peers tables compared to java/go/python.

Wdym by fail? Until correct schema is read, queries will fail if DC is specified and failover disabled, that is true, and imo correct behavior. But the driver should keep trying to re-read the schema, and I think it does (doesn't it?)

@fruch
Copy link

fruch commented Jul 30, 2024

sorry but the use said specifically to enable failover:

policy_builder = policy_builder.prefer_datacenter(dc.to_owned()).permit_dc_failover(true);

see
scylladb/latte@52db0cd#diff-d6346fd7d17270b1282142aeeda9c4bc2b7d8fd0f37b24a1c871a9257f0ed0aaR67

OK, but failover can be also disabled by using local consistency. See code of the default load balancer:

fn is_datacenter_failover_possible(&self, routing_info: &ProcessedRoutingInfo) -> bool {
    self.preferences.datacenter().is_some()
        && self.permit_dc_failover
        && !routing_info.local_consistency
}

whereas local consistency is defined as:

let local_consistency = matches!(
    (query.consistency, query.serial_consistency),
    (Consistency::LocalQuorum, _)
        | (Consistency::LocalOne, _)
        | (_, Some(SerialConsistency::LocalSerial))
);

the user didn't do a single CQL statement yet at this point, it's just connecting ? why it's need to specify which consistency is going to do at this point ? maybe I'm gonna use both on the same connection ?

I'm confused, and not sure not what exactly we should do in latte code, for enabling the DC failover ?

@fruch
Copy link

fruch commented Jul 30, 2024

also I think the missing part, is also that in a few seconds the data would be filled by scylla, i.e. on the next refresh.
as other driver doesn't fails, and keeps on reading the peers table until correct data appear there.

I'm pretty sure that Rust Driver will try to re-read schema too.

I think it already agreed that rust driver is a bit too happy trigger to fail on missing data in peers tables compared to java/go/python.

Wdym by fail? Until correct schema is read, queries will fail if DC is specified and failover disabled, that is true, and imo correct behavior. But the driver should keep trying to re-read the schema, and I think it does (doesn't it?)

again in this case, we don't have queries failing, we failing in establishing the connection at all, or latte command didn't got to do a single CQL command, it failed and bailed out.

@fruch
Copy link

fruch commented Jul 30, 2024

also I think the missing part, is also that in a few seconds the data would be filled by scylla, i.e. on the next refresh.
as other driver doesn't fails, and keeps on reading the peers table until correct data appear there.

I'm pretty sure that Rust Driver will try to re-read schema too.

I think it already agreed that rust driver is a bit too happy trigger to fail on missing data in peers tables compared to java/go/python.

Wdym by fail? Until correct schema is read, queries will fail if DC is specified and failover disabled, that is true, and imo correct behavior. But the driver should keep trying to re-read the schema, and I think it does (doesn't it?)

see that log latte-l1-c0-e4704b37-178d-494e-9eb1-34ffc51cc6d7.log in:
https://cloudius-jenkins-test.s3.amazonaws.com/9df8d6e9-6092-4090-81e7-3290a9545d77/20240728_092446/loader-set-9df8d6e9.tar.gz

@wprzytula
Copy link
Collaborator

the user didn't do a single CQL statement yet at this point, it's just connecting ?

This is false. Looking at the logs:

error: Cassandra error: Failed to execute query "SELECT cluster_name, release_version FROM system.local" with params []: Protocol Error: Empty query plan - driver bug!

we clearly see that the driver issued a query and failed.

Moreover, the following log line shows that the query was executed with a local consistency (LocalQuorum):

�[2m2024-07-28T09:14:31.180354Z�[0m �[31mERROR�[0m �[2mscylla::transport::load_balancing::plan�[0m�[2m:�[0m Load balancing policy returned an empty plan! The query cannot be executed. Routing info: RoutingInfo { consistency: LocalQuorum, serial_consistency: Some(LocalSerial), token: None, table: None, is_confirmed_lwt: false }

which explains why driver rejected initial contact points as coordinator for the query - it could not determine that they belong to the local datacenter, which is required to allow queries having local consistency specified.

@wprzytula
Copy link
Collaborator

I'm confused, and not sure not what exactly we should do in latte code, for enabling the DC failover ?

Use non-local consistency. E.g. Quorum instead of LocalQuorum.

@fruch
Copy link

fruch commented Jul 30, 2024

I'm confused, and not sure not what exactly we should do in latte code, for enabling the DC failover ?

Use non-local consistency. E.g. Quorum instead of LocalQuorum.

but we do want to use LocalQuorum, and still have a fallback in case of errors.
none of the other driver makes this assumption.

and as a user, it's really not clear, especially when I explicitly define I do want fallback.

@wprzytula
Copy link
Collaborator

wprzytula commented Jul 30, 2024

but we do want to use LocalQuorum, and still have a fallback in case of errors.
none of the other driver makes this assumption.

and as a user, it's really not clear, especially when I explicitly define I do want fallback.

What is the point of using LocalQuorum and expecting the driver to perform datacenter failover? Local consistencies only make sense when a node from the local datacenter becomes the coordinator. Else, an error seems to be the most reasonable option.

@fruch
Copy link

fruch commented Jul 30, 2024

but we do want to use LocalQuorum, and still have a fallback in case of errors.
none of the other driver makes this assumption.
and as a user, it's really not clear, especially when I explicitly define I do want fallback.

What is the point of using LocalQuorum and expecting the driver to perform datacenter failover? Local consistencies only make sense when a node from the local datacenter becomes the coordinator. Else, an error seems to be the most reasonable option.

again, we run longevity, which we expect to be running for multiple hours, and we do want to be using LocalQuorum, since that how customers are using when they have multi DC setups,

we don't mind that in cases of failure with a DC, it would fallback to the other DCs, on the other way around, we don't want to be stopped or block cause of issue with scylla or the driver.

if there are configuration and safe gurde, as a user I wan't to be able to configure it,

telling me that in this case there's a consistency mode I can't use, I don't think this is an answer we could give a paying customer.

@vponomaryov
Copy link
Author

vponomaryov commented Jul 30, 2024

@fruch

The SELECT cluster_name, release_version FROM system.local query is what latte does before the main load.
It is single separate cql query used to get info for using in couple of places such as log report name.

We definitely can use the CL=Quorum there without any problems.
Because it has no relation to the main load we run afterwards.

Moreover, we can even completely remove it in our fork if we need to.

@wprzytula thanks for the spotting problematic point in the latte tool.

@fruch it is very easy to fix, will be part of the next latte-fork tag.

@wprzytula
Copy link
Collaborator

telling me that in this case there's a consistency mode I can't use, I don't think this is an answer we could give a paying customer.

Well, if a customer sets a particular consistency mode, we are obliged to either satisfy it or return an error.
If a user expects local consistency, we mustn't complete the query by having a remote node as the coordinator.
There's nothing about having a paying or nonpaying customer; it's about having a customer with consistency guarantees fulfilled or broken.

@roydahan
Copy link
Collaborator

Maybe the error handling here could be improved?

@wprzytula
Copy link
Collaborator

Maybe the error handling here could be improved?

Definitely. User should not see a "driver bug" message in situation that is not a driver bug.

@Lorak-mmk
Copy link
Collaborator

but we do want to use LocalQuorum, and still have a fallback in case of errors.
none of the other driver makes this assumption.
and as a user, it's really not clear, especially when I explicitly define I do want fallback.

What is the point of using LocalQuorum and expecting the driver to perform datacenter failover? Local consistencies only make sense when a node from the local datacenter becomes the coordinator. Else, an error seems to be the most reasonable option.

As I mentioned in the private message, Scylla documentation does not impose this requirements, and does not define LOCAL this way.
From Scylla docs about LOCAL_QUORUM:

Same as QUORUM, but confined to the same datacenter as the coordinator.

This is a strictly server-side definition and does not state any driver-side requirement - because consistency level is something to be handled server-side, not driver-side.
I also don't think any other driver impose such restrictions.
If such restriction should be present, we should have a good reason for it, document it, and implement it in other drivers as well.
But I suspect that the restriction should not be there and we should perform failover if user requested so.

Not sure who to ask here. @nyh ? @avikivity ?

@Lorak-mmk
Copy link
Collaborator

but we do want to use LocalQuorum, and still have a fallback in case of errors.
none of the other driver makes this assumption.
and as a user, it's really not clear, especially when I explicitly define I do want fallback.

What is the point of using LocalQuorum and expecting the driver to perform datacenter failover? Local consistencies only make sense when a node from the local datacenter becomes the coordinator. Else, an error seems to be the most reasonable option.

As I mentioned in the private message, Scylla documentation does not impose this requirements, and does not define LOCAL this way. From Scylla docs about LOCAL_QUORUM:

Same as QUORUM, but confined to the same datacenter as the coordinator.

This is a strictly server-side definition and does not state any driver-side requirement - because consistency level is something to be handled server-side, not driver-side. I also don't think any other driver impose such restrictions. If such restriction should be present, we should have a good reason for it, document it, and implement it in other drivers as well. But I suspect that the restriction should not be there and we should perform failover if user requested so.

Not sure who to ask here. @nyh ? @avikivity ?

Or maybe @kbr-scylla or @piodul ?

@fruch
Copy link

fruch commented Aug 1, 2024

Also the docs of the java 4.x, explain the failover option quite good:
https://docs.datastax.com/en/developer/java-driver/4.15/manual/core/load_balancing/index.html#cross-datacenter-failover

I think we need to fix scylla regression, cause it can cause issues for some of the drivers.
And still we should try to supply the same option across driver to handle such a situation (scylla bugs do happen from time to time...).

in current situation, a user would be force to implement some retries on the application end, or stop using LOCAL_*, which might won't necessarily match his needed

@wprzytula
Copy link
Collaborator

in current situation, a user would be force to implement some retries on the application end, or stop using LOCAL_*, which might won't necessarily match his needed

A fix that prevents metadata refresh failure on invalid peer entry has been merged. The soonest release is going to contain the fix.

@wprzytula wprzytula self-assigned this Aug 1, 2024
@kbr-scylla
Copy link

Or maybe @kbr-scylla or @piodul ?

I think it is both true that:

  • LOCAL_QUORUM is a server-side thing, if driver sends query to node in dc X, then that node becomes coordinator and will expect quorum replies from replicas in dc X.
  • but it should also be possible to configure the driver to connect only to given dc, if user so desires, but I think that's orthogonal to LOCAL_QUORUM

Driver should probably not send user queries until it learns topology (location in dc and rack) information, if those queries are configured to reach only the "local" dc (whatever the user specified as "local" dc in driver config). Up to that point, queries should be probably queued up.

And it doesn't matter what CL those queries are using. If the user specifies that driver must only contact dc X, then all queries, even those using CL=quorum or CL=one or whatever, should go only to dc X.

@Lorak-mmk
Copy link
Collaborator

@kbr-scylla I think the main question here is slightly different.
If a user enabled DC failover (so is ok with requests being sent to different DC if they can't be sent to preferred one), should the driver ignore this setting for requests with LOCAL* CL, and only send them to preferred DC?
Because this is current behavior of Rust Driver and I'm not sure it is correct.

The argument for forcing CL=LOCAL* requests to always go to local DC is that we can achieve consistency (as defined by https://opensource.docs.scylladb.com/stable/cql/consistency-calculator.html for read CL and write CL = QUORUM - not sure what are the exact semantics, that reads will see all previous writes I think?) with regards to all clients connected to this DC. If we route the request outside of this DC then this guarantee no longer holds.

But there are counterarguments:

  • not every user must need this particular guarantee, some may be OK with routing to different DCs
  • If we change this behavior then user can still force those queries to local DC - by making new execution profile and disabling DC failover on it. It is not however possible with the current version to allow DC failover for CL=LOCAL* requests.

@kbr-scylla
Copy link

The argument for forcing CL=LOCAL* requests to always go to local DC is that we can achieve consistency (as defined by https://opensource.docs.scylladb.com/stable/cql/consistency-calculator.html for read CL and write CL = QUORUM - not sure what are the exact semantics, that reads will see all previous writes I think?) with regards to all clients connected to this DC. If we route the request outside of this DC then this guarantee no longer holds.

Is "local" and "preferred" the same thing? You're using a bunch of terminology that I'm not sure I understand.

So let's assume there is a "local" DC and there is a "remote" DC configured by the driver.

The user should be able to configure all their queries e.g. to some chosen keyspace, with CL=LOCAL_QUORUM, to go to the "remote" DC, right? Then the consistency guarantees will be provided, and everything should work. But it sounds like with this restriction you have, this is impossible: the user's desire to send only to "remote" DC conflicts with enforcing all LOCAL_* queries to the "local" DC.

@Lorak-mmk
Copy link
Collaborator

The argument for forcing CL=LOCAL* requests to always go to local DC is that we can achieve consistency (as defined by https://opensource.docs.scylladb.com/stable/cql/consistency-calculator.html for read CL and write CL = QUORUM - not sure what are the exact semantics, that reads will see all previous writes I think?) with regards to all clients connected to this DC. If we route the request outside of this DC then this guarantee no longer holds.

Is "local" and "preferred" the same thing? You're using a bunch of terminology that I'm not sure I understand.

Ok, let me clear that up (you probably know most of that, but just to be sure we are on the same page).
All the drivers have Load Balancing Policies (LBPs) - pieces of code that take a query and produce a list of nodes (+ optionally shards in case of our drivers) to send this query to - let's call this a Query Execution Plan (or Plan). Why a list and not a single node? Because drivers have retry and speculative execution mechanisms, that make use of subsequent elements.
Token Awareness (and Shard Awareness in our drivers) is implemented in LBPs - Token Aware LBP will put replicas for given query at the beginning of the plan.
It's similar for DC Awareness (and lately - Rack Awareness). User can configure LBP that is DC Aware by setting "local DC" - usually it will be the DC that the driver is running in, or a closest DC. Then such policy will prioritize (put on the beggining of Plan) the nodes that are in the "local DC".
There is one more piece of configuration for DC / Rack Aware LBPs: "dc failover". If DISABLED, then Plan will only consist of nodes from "local DC". If ENABLED, then nodes from "local DC" will be the first elements of the Plan, but then there will be nodes from other ("remote") DCs.

The question is: if a query has CL=LOCAL*, should "dc failover" setting be ignored and treated as DISABLED?

So let's assume there is a "local" DC and there is a "remote" DC configured by the driver.

The user should be able to configure all their queries e.g. to some chosen keyspace, with CL=LOCAL_QUORUM, to go to the "remote" DC, right? Then the consistency guarantees will be provided, and everything should work. But it sounds like with this restriction you have, this is impossible: the user's desire to send only to "remote" DC conflicts with enforcing all LOCAL_* queries to the "local" DC.

@kbr-scylla
Copy link

So the dilemma is between giving a choice to the user, or taking it away.
They already have a way to disable "dc failover" if they don't like it. We don't have to force them to have it disabled when using CL=LOCAL*.

So my vote is to give them the choice. If dc failover is enabled -- the user chose failover -- so do it as they requested.

BTW what does the java driver do?

@Lorak-mmk
Copy link
Collaborator

BTW what does the java driver do?

I think no other driver forces dc failover to be disabled for CL=LOCAL* queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

7 participants