Why write the replication log to two places? #5116
-
Why does the CouchDB Replication Protocol call for the replicator to write the same replication log to the source and target, and later compare them for common ancestry? This requires the source database to accept writes. What are the benefits relative to having the replicator just write to the target db, or the local filesystem? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
The reason is to ensure that if source is replaced (deleted and recreated and possibly populated with new data), the replication job would be able to detect that. Otherwise, it might continue checkpointing and just move on assuming the source data is all replicated to the target, while in fact it won't be any longer. However, as of a few years ago we started adding an [1] #3901 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
yeah, originally in 1.x it was the start time of the couch_db_updater process. In 2.x and 3.x versions it is either hard-coded at 0 or the unix timestamp of the db creation time. For better or for worse, replicator already uses it to detect database re-creation events, but only within a single session. To allow checkpointing on target only, it would have to be extended to apply between replicator sessions, so that the source's |
Beta Was this translation helpful? Give feedback.
-
fairly sure we added a special role you can grant on the source to only allow writing of checkpoints and not docs generally? |
Beta Was this translation helpful? Give feedback.
-
I think that's a Cloudant-special, there is a |
Beta Was this translation helpful? Give feedback.
The reason is to ensure that if source is replaced (deleted and recreated and possibly populated with new data), the replication job would be able to detect that. Otherwise, it might continue checkpointing and just move on assuming the source data is all replicated to the target, while in fact it won't be any longer.
However, as of a few years ago we started adding an
instance_start_time
field in db info response:GET http://$server_url/$dbname
[1]. It was always 0 before, but now it's a usable db creation unix timestamp value. That value would allow detecting if a source database has likely been recreated: we'd save that in the checkpoint on the target, and then verify if we restart the r…