Why write the replication log to two places? #5116

hashtagchris · 2024-07-04T23:52:29Z

hashtagchris
Jul 4, 2024

Why does the CouchDB Replication Protocol call for the replicator to write the same replication log to the source and target, and later compare them for common ancestry?

This requires the source database to accept writes. What are the benefits relative to having the replicator just write to the target db, or the local filesystem?

Answered by nickva

Jul 5, 2024

The reason is to ensure that if source is replaced (deleted and recreated and possibly populated with new data), the replication job would be able to detect that. Otherwise, it might continue checkpointing and just move on assuming the source data is all replicated to the target, while in fact it won't be any longer.

However, as of a few years ago we started adding an instance_start_time field in db info response:GET http://$server_url/$dbname [1]. It was always 0 before, but now it's a usable db creation unix timestamp value. That value would allow detecting if a source database has likely been recreated: we'd save that in the checkpoint on the target, and then verify if we restart the r…

View full answer

nickva · 2024-07-05T03:48:59Z

nickva
Jul 5, 2024
Collaborator

The reason is to ensure that if source is replaced (deleted and recreated and possibly populated with new data), the replication job would be able to detect that. Otherwise, it might continue checkpointing and just move on assuming the source data is all replicated to the target, while in fact it won't be any longer.

However, as of a few years ago we started adding an instance_start_time field in db info response:GET http://$server_url/$dbname [1]. It was always 0 before, but now it's a usable db creation unix timestamp value. That value would allow detecting if a source database has likely been recreated: we'd save that in the checkpoint on the target, and then verify if we restart the replication job or when we attempt to checkpoint. That would make it possible to avoid writes to the source db, since we could have a reasonable way to detect if that db was replaced.

[1] #3901

3 replies

hashtagchris Jul 6, 2024
Author

Thanks. I did some quick testing and verified that I do miss some changes following the restore of an older backup for the source database.

# create a "source" database and add some documents (beforeBackup1, beforeBackup2)

# stop CouchDB and backup data
% cp -r couchdb couchdb.bkup

# start CouchDB and add some more documents to the source database (inbetween1, inbetween2)

# query for changes, as if we're starting a new replication session
% curl 'http://127.0.0.1:5984/source/_changes'                                                                                                                                    
{"results":[
{"seq":"1-g1AAAAB5eJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuRAY-iPBYgydAApP6D1GYwJzLmAgXYLQ1MDJLMjbHpywIA5RImOw","id":"beforeBackup2","changes":[{"rev":"1-967a00dff5e02add41819138abb3284d"}]},
{"seq":"2-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZExFyjAbmphamScnIJNAx5j8liAJEMDkPqPYpqlgYlBkrkxNn1ZAFpNMD8","id":"beforeBackup1","changes":[{"rev":"1-967a00dff5e02add41819138abb3284d"}]},
{"seq":"3-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZExFyjAbmphamScnIJNAx5j8liAJEMDkPoPNY0JbJqlgYlBkrkxNn1ZAFpvMEA","id":"inbetween2","changes":[{"rev":"1-967a00dff5e02add41819138abb3284d"}]},
{"seq":"4-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEpFyjAbmphamScnIJNAx5j8liAJEMDkPqPYpqlgYlBkrkxNn1ZAFrfMEE","id":"inbetween1","changes":[{"rev":"1-967a00dff5e02add41819138abb3284d"}]}
],
"last_seq":"4-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEpFyjAbmphamScnIJNAx5j8liAJEMDkPqPYpqlgYlBkrkxNn1ZAFrfMEE","pending":0}

# stop CouchDB and restore backups
% rm -rf couchdb
% cp -r couchdb.bkup couchdb      

# restart CouchDB and add some more documents to the source database (afterRestore1, afterRestore2)

# query for new changes, like we're resuming replication
# ⚠️ we don't get afterRestore1 or afterRestore2 back
% curl 'http://127.0.0.1:5984/source/_changes?since=4-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEpFyjAbmphamScnIJNAx5j8liAJEMDkPqPYpqlgYlBkrkxNn1ZAFrfMEE'
{"results":[

],
"last_seq":"4-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEpFyjAbmphamScnIJNAx5j8liAJEMDkPqPYpqlgYlBkrkxNn1ZAFrfMEE","pending":0}

# add another document to the source database (afterRestore3)

# query for new changes again
# ⚠️  we get incomplete results - only one document added after the DB was restored
% curl 'http://127.0.0.1:5984/source/_changes?since=4-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEpFyjAbmphamScnIJNAx5j8liAJEMDkPqPYpqlgYlBkrkxNn1ZAFrfMEE'
{"results":[
{"seq":"6-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZElFyjAbmphamScnIJNAx5j8liAJEMDkPoPNY0JbJqlgYlBkrkxNn1ZAFu_MEM","id":"afterRestore3","changes":[{"rev":"1-967a00dff5e02add41819138abb3284d"}]}
],
"last_seq":"6-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZElFyjAbmphamScnIJNAx5j8liAJEMDkPoPNY0JbJqlgYlBkrkxNn1ZAFu_MEM","pending":0}

hashtagchris Jul 6, 2024
Author

Also I don't see instance_start_time change if i simply backup and restore CouchDB's database_dir. So I can see how writing the replication log to the source detects the source db being recreated from an old backup.

nickva Jul 6, 2024
Collaborator

Hmm it looks like something went wrong adding the afterRestore1, afterRestore2 docs since querying for changes after adding them shows 4-g1AAAACbeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEpFyjAbmphamScnIJNAx5j8liAJEMDkPqPYpqlgYlBkrkxNn1ZAFrfMEE","pending":0 as the last seq. Those updates should have bumped the last update sequence.

Also I don't see instance_start_time change if i simply backup and restore CouchDB's database_dir. So I can see how writing the replication log to the source detects the source db being recreated from an old backup.

Because the instance_start_time is in the database creation time in the _dbs metadata database. It's saved a field in the _dbs/$dbname document. If that _dbs db and doc was backed up and restored, it will show the same instance_start_time.

Another effect with backup and restore vs a recreating the database from scratch, is that backup+restore keeps the _local/$uuid replicator checkpoint documents as well. The replicator will then find the last common history between the target and source checkpoints and resume from there.

rnewson · 2024-07-05T08:30:37Z

rnewson
Jul 5, 2024
Collaborator

instance_start_time's original implementation was the timestamp when we created the couch_db_* processes for a given database, and it increased if the db was closed/reopened (by LRU pressure, say). the replicator would then restart from last checkpoint as a precaution. It is not a safe replacement for the ancestry check, as that does not rely on wallclock time.

0 replies

nickva · 2024-07-05T15:28:05Z

nickva
Jul 5, 2024
Collaborator

yeah, originally in 1.x it was the start time of the couch_db_updater process. In 2.x and 3.x versions it is either hard-coded at 0 or the unix timestamp of the db creation time.

For better or for worse, replicator already uses it to detect database re-creation events, but only within a single session. To allow checkpointing on target only, it would have to be extended to apply between replicator sessions, so that the source's instance_start_time would be persisted in the target's checkpoints, and gated by the feature detection that's it's a recent 3.x release and has a non-0 instance_start_time.

0 replies

rnewson · 2024-07-05T16:04:30Z

rnewson
Jul 5, 2024
Collaborator

fairly sure we added a special role you can grant on the source to only allow writing of checkpoints and not docs generally?

0 replies

nickva · 2024-07-05T16:23:10Z

nickva
Jul 5, 2024
Collaborator

I think that's a Cloudant-special, there is a Checkpointer role there: https://cloud.ibm.com/docs/Cloudant?topic=Cloudant-managing-access-for-cloudant it's essentially read + write _local docs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why write the replication log to two places? #5116

{{title}}

Replies: 5 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Why write the replication log to two places? #5116

hashtagchris Jul 4, 2024

Replies: 5 comments · 3 replies

nickva Jul 5, 2024 Collaborator

hashtagchris Jul 6, 2024 Author

hashtagchris Jul 6, 2024 Author

nickva Jul 6, 2024 Collaborator

rnewson Jul 5, 2024 Collaborator

nickva Jul 5, 2024 Collaborator

rnewson Jul 5, 2024 Collaborator

nickva Jul 5, 2024 Collaborator

hashtagchris
Jul 4, 2024

Replies: 5 comments 3 replies

nickva
Jul 5, 2024
Collaborator

hashtagchris Jul 6, 2024
Author

hashtagchris Jul 6, 2024
Author

nickva Jul 6, 2024
Collaborator

rnewson
Jul 5, 2024
Collaborator

nickva
Jul 5, 2024
Collaborator

rnewson
Jul 5, 2024
Collaborator

nickva
Jul 5, 2024
Collaborator