Send empties during sync based on previous timestamp #220

somtochiama · 2024-06-10T23:47:45Z

This pull requests updates the sync process between nodes to also send previously cleared versions that the client node might have already received. Each time a node stores an empty version, it also stores a timestamp and can use this to know which versions have been cleared or updated since the last sync.

jeromegn

I've added inline comments, but in short: only the source node should dictate the cleared timestamps.

The TRIGGER should be changed back to only clear the current site ID's versions and ignore the rest.

So when a node sends a Sync request to another with the "last cleared ts", it won't send its own timestamp, it will send the last one it has seen coming from this node.

crates/corro-types/src/agent.rs

crates/corrosion/src/command/consul/sync.rs

jeromegn · 2024-06-11T13:06:38Z

crates/corro-types/src/change.rs

@@ -345,20 +354,21 @@ pub fn store_empty_changeset(

    // println!("inserting: {new_ranges:?}");

+    let ts = Timestamp::from(agent.clock().new_timestamp());


I don't think we ever want to generate our own timestamp for empties unless we're the node that initially created the change. How else would nodes know which empties they are missing? The last cleared timestamp could be more recent than the last one that was sent from another node.

crates/corro-types/src/broadcast.rs

jeromegn

Added a few comments, but the big possible problem I see here is: losing the information that we are missing some cleared versions.

If we keep updating the "last cleared ts" on every change we apply, then we don't know if we might've missed a cleared versions message in between.

I think this whole mechanism only works if we only store that information when we synchronize with another node and we store their cleared timestamp specifically. that way, we know for sure we've seen all versions the last time we synced with them.

Furthermore, the timestamps stored in bookkeeping help a bit because it gives us a way to reduce the synchronization bandwidth and processing load in the future. But these timestamps need to keep the original cleared timestamp, they can't be created from the current actor's clock.

In summary:

if we keep updating the sync table with timestamps as we get them from any node, then we'll miss updates
we should only update the last cleared ts when we sync and only for the node we're syncing with
we should keep the original timestamp of the cleared version

crates/corro-agent/src/api/peer.rs

jeromegn · 2024-06-14T12:37:44Z

crates/corro-types/src/agent.rs

+        CREATE TEMP TABLE _variables (name TEXT PRIMARY KEY, var TEXT);
+        INSERT INTO _variables VALUES ('current_ts', strftime('%Y-%m-%dT%H:%M:%S.000000000Z', CURRENT_TIMESTAMP));
+
+        UPDATE __corro_bookkeeping SET ts = (SELECT var FROM _variables WHERE name = 'current_ts') WHERE ts IS NULL;


Interesting! Is that faster than just using strftime directly instead of a temporary table?

also: should this only update the current node? the current node is the source of truth for its own clears.

Interesting! Is that faster than just using strftime directly instead of a temporary table?

I don't think so. I am using a temp table because I want to put the same value in __corro_sync_state for this note. I guess its a big deal if they are slightly diffferent

crates/corro-types/src/change.rs

crates/corro-agent/src/agent/util.rs

jeromegn · 2024-06-14T13:13:12Z

crates/corro-agent/src/agent/util.rs

+            .blocking_write("process_multiple_changes(update_cleared_ts)");
+        let mut snap = booked_writer.snapshot();
+        if let Some(ts) = last_cleared {
+            snap.update_cleared_ts(&tx, ts)


🤔 isn't that redundant?

the one above if for the received empties. this one is for the node's own empties (since we check for the node's overwrittten versions and we clear them in this function)

crates/corro-agent/src/api/peer.rs

crates/corro-agent/src/agent/util.rs

crates/corro-agent/src/agent/handlers.rs

jeromegn · 2024-06-19T15:23:47Z

crates/corro-agent/src/agent/handlers.rs

+        if process {
+            let mut retain_keys = Vec::new();
+            for (actor, changes) in &buf {
+                match process_emptyset(agent.clone(), bookie.clone(), *actor, changes.clone()).await


since you're awaiting without spawning, you can pass references instead of cloning.

somtochiama · 2024-07-04T19:09:32Z

superseded by #223

somtochiama changed the title ~~send empties during sync~~ Send empties during sync based on previous timestamp Jun 10, 2024

jeromegn reviewed Jun 11, 2024

View reviewed changes

somtochiama requested a review from jeromegn June 13, 2024 21:06

jeromegn reviewed Jun 14, 2024

View reviewed changes

somtochiama requested a review from jeromegn June 14, 2024 18:29

jeromegn reviewed Jun 19, 2024

View reviewed changes

somtochiama requested a review from jeromegn June 19, 2024 16:57

send empties during sync

5e80e0d

somtochiama force-pushed the sync-cleared-versions branch from 4e73de0 to 5e80e0d Compare June 19, 2024 20:12

somtochiama closed this Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send empties during sync based on previous timestamp #220

Send empties during sync based on previous timestamp #220

somtochiama commented Jun 10, 2024 •

edited

Loading

jeromegn left a comment

jeromegn Jun 11, 2024

jeromegn left a comment

jeromegn Jun 14, 2024

jeromegn Jun 14, 2024

somtochiama Jun 14, 2024

jeromegn Jun 14, 2024

somtochiama Jun 14, 2024

jeromegn Jun 19, 2024

somtochiama commented Jul 4, 2024

		@@ -345,20 +354,21 @@ pub fn store_empty_changeset(

		// println!("inserting: {new_ranges:?}");

		let ts = Timestamp::from(agent.clock().new_timestamp());

Send empties during sync based on previous timestamp #220

Send empties during sync based on previous timestamp #220

Conversation

somtochiama commented Jun 10, 2024 • edited Loading

jeromegn left a comment

Choose a reason for hiding this comment

jeromegn Jun 11, 2024

Choose a reason for hiding this comment

jeromegn left a comment

Choose a reason for hiding this comment

jeromegn Jun 14, 2024

Choose a reason for hiding this comment

jeromegn Jun 14, 2024

Choose a reason for hiding this comment

somtochiama Jun 14, 2024

Choose a reason for hiding this comment

jeromegn Jun 14, 2024

Choose a reason for hiding this comment

somtochiama Jun 14, 2024

Choose a reason for hiding this comment

jeromegn Jun 19, 2024

Choose a reason for hiding this comment

somtochiama commented Jul 4, 2024

somtochiama commented Jun 10, 2024 •

edited

Loading