-
Is your feature request related to a problem? Please describe.Hi I was in the process of implementing embedded debezium using postgres connector and rabbitmq streams utilizing publishingId based on the provider wal ids from debezium for deduplication Unfortunately with postgresql wal files there is no single long value that is increasing and ordered. THey do provide the so called LSN number but this is not guarantteed to be ordered since you might have concurrent transactions. If you want something which is increasing and ordered then you need to combine both last commit lsn and current processing lsn. Former provides an increasing and ordered number between transactions and later provides an increasing number between multiple statemenets of a transaction. So in general what needs to be done is to combine both longs in a uint128_t in order to have an increasing and ordered deduplicationId. More context can be provided on the following links Describe the solution you'd likeI was wondering if there are discussions in place to lift the long as restriction of publishinId and let the consumer pass anything. If my undertanding is correct publishingId as of now is just saved in the stream and its up to the consumer to retrieve it and compare it in order to decided if they want to deduplicate or not. If this is correct then might as well allow it to be anything the consumer requires. Describe alternatives you've consideredNo response Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 16 replies
-
@zenios thank you for the context. We move the majority of issues to discussions, at least at first, because they lack context. But this is a fairly specific change request and you did explain what you are doing and why. Thank you. My primary concern is that this would be a breaking protocol change, so we can likely only ship it in 4.0, if |
Beta Was this translation helpful? Give feedback.
-
Protocol wise indeed it would be. The clients though shipped as of now i am sure can be converted in a compatible way to support old and new way of working. For informational purposes is there a specific reason publishingId was designed as a uint64 instad of byte[]? |
Beta Was this translation helpful? Give feedback.
-
@zenios according to other members of the team, this was a conscious protocol design decision. Incrementing positive integers are naturally unique, easy to reason about for humans, and most importantly, are very efficient to compare. I don't have an alternative to offer for Debezium but we would like to keep the protocol this way. |
Beta Was this translation helpful? Give feedback.
-
No the stream maintains the publishing-id and will not write any messages for a given publisher if they send an id that is lower than or equal to the last publishing id seen for the given publisher name. Consumers do not need to do anything additional to de-duplicate. In fact this publishing-id is not exposed to consumers (unless you add it to the message meta data yourself). |
Beta Was this translation helpful? Give feedback.
No the stream maintains the publishing-id and will not write any messages for a given publisher if they send an id that is lower than or equal to the last publishing id seen for the given publisher name. Consumers do not need to do anything additional to de-duplicate. In fact this publishing-id is not exposed to consumers (unless you add it to the message meta data …