diff --git a/docs/dev/raft-in-scylla.md b/docs/dev/raft-in-scylla.md index 68e590c63faf..69fd25752be6 100644 --- a/docs/dev/raft-in-scylla.md +++ b/docs/dev/raft-in-scylla.md @@ -11,10 +11,11 @@ operations. This is why the Raft library in raft/ has the only dependency In order to use the library, the client (Scylla server) needs to provide implementations for three key interfaces: - - persistence - to persist Raft state - - rpc - to exchange messages with instances of the library + +- persistence - to persist Raft state +- rpc - to exchange messages with instances of the library on other machines - - the client state machine - to execute commands once +- the client state machine - to execute commands once they were replicated and committed on a majority of nodes. Depending on the application (data, topology, or schema) Scylla can use @@ -34,13 +35,14 @@ with Raft peers of the group on other nodes. For example, to persist the changes to schema and topology, Scylla can (and does) use node-local system tables. There are three such tables: - - `raft`, the main table which stores the Raft log for each + +- `raft`, the main table which stores the Raft log for each group. The table partition key is group id, so each log forms its own partition. Since the table is local, this works fine with many groups. - - `raft_snapshots`, a supporting table storing the so-called +- `raft_snapshots`, a supporting table storing the so-called snapshot descriptors, - - `raft_config`, a normalized part of raft +- `raft_config`, a normalized part of raft `raft_snapshots`, storing the cluster configuration at the time of taking the snapshot. May be out of date with the real cluster configuration, e.g. when configuration @@ -135,7 +137,8 @@ in the same order on every node. However, while Raft guarantees that if an operation succeeds, it's stored in the logs on the majority of nodes, it doesn't provide an API for strictly ordered schema changes out of the box, for two reasons: - - Raft only stores a log of the operations. The operations + +- Raft only stores a log of the operations. The operations themselves - the schema changes - are applied to the client state machine once they are committed to the log. In order to construct a new operation it's necessary to read the @@ -156,7 +159,7 @@ schema changes out of the box, for two reasons: yet, and append identical commands for creating the table to the Raft log. The second command will fail to apply to the client state machine. - - if a leader or network fails when committing an operation to +- if a leader or network fails when committing an operation to Raft log, the client has no way of knowing its status. E.g. a network can time out, and establishing whether or not the majority of Raft nodes store the command in its log or have applied it