[v24.1.x] Fix timequery returning wrong offset after trim-prefix which could lead to stuck consumers #18281

vbotbuildovich · 2024-05-07T11:45:37Z

Backport of PR #18112

(cherry picked from commit 99d2bec)

(cherry picked from commit d97d61f)

(cherry picked from commit 4f87afa)

Encapsulates common operations on offset intervals. For now, although it is named bounded, the maximum offset can still be set to `model::offset::max()`. I will likely change this in the future as it requires changing quite a bit of call sites, most likely only tests. This little data structure tries to be very light weight and impose minimum overhead on basic interval operations like intersection or inclusion. It is also quite hard to use it incorrectly due to the checked construction variant and debug assertions. Later, we might want to refactor things like log_reader to use this instead of min and max offsets like they do today. Once that is done, the checked variant needs to be called only once at the kafka layer. For everyone else it becomes a ~0 cost abstraction. (cherry picked from commit f13bfa6)

kafka ListOffsets request allows to query offsets by timestamps. The result of such a request is the first kafka offset in the log that has a timestamp equal to or greater than the requested timestamp. Or, -1 if no such record can be found. The implementation we have today assumes that the start of the physical log matches the start of the log as it is seen by external users/kafka clients. However, this is not always true. In particular, when [trim-prefix][1] (prefix truncation) is used. There are 2 sources of problems: 1) trim-prefix is synchronously applied at cluster layer where it changes the visibility of the log from the client point-of-view, but it is asynchronously applied to the consensus log/physical log/ disk_log_impl class, cloud storage. 2) trim-prefix is executed with an offset of a record that in in the middle of a batch. As a result, in these scenarios, if the clients sends a kafka Fetch request with the received offset they'll be replied with OffsetOutOfRange error. This commit changes such queries are implemented at the lower levels of the system by carrying the information about the visible start and end of the log together with the timestamp. Then, at the lower levels, we use these to limit our search only to that range. Although this commit does not change the implementation of the tiered storage timequery it does fix the trim-prefix problem there too in the general case because of check and "bump" added in redpanda-data#11579. Tiered Storage timequeries have some additional problems which I plan to address in redpanda-data#18097. [1]: https://docs.redpanda.com/current/reference/rpk/rpk-topic/rpk-topic-trim-prefix/ (cherry picked from commit 76a1ea2)

Previous code contained a bug which is masked by the retry logic in replicated partition: const auto kafka_start_override = _partition->kafka_start_offset_override(); if ( !kafka_start_override.has_value() || kafka_start_override.value() <= res.value().offset) { // The start override doesn't affect the result of the timequery. co_return res; } vlog( klog.debug, "{} timequery result {} clamped by start override, fetching result at " "start {}", ntp(), res->offset, kafka_start_override.value()); ... (cherry picked from commit f9ed5ca)

(cherry picked from commit a40999d)

Not easy to test that this is right so not going to for now. (cherry picked from commit 8f2de96)

piyushredpanda · 2024-05-09T17:19:44Z

Known failures: #15312 and #14898

nvartolomei added 8 commits May 7, 2024 11:45

m/tests: allow generating records with matching timestamps

34cc579

(cherry picked from commit 99d2bec)

storage: stylistic improvements to timequery tests

65291c5

(cherry picked from commit d97d61f)

storage: document log reader semantics

867c179

(cherry picked from commit 4f87afa)

rptest: test trim-prefix with timequery

3ebbe7c

(cherry picked from commit a40999d)

kafka: fix off-by-one in timequery

6a345d9

Not easy to test that this is right so not going to for now. (cherry picked from commit 8f2de96)

vbotbuildovich added this to the v24.1.x-next milestone May 7, 2024

vbotbuildovich added the kind/backport PRs targeting a stable branch label May 7, 2024

vbotbuildovich requested a review from nvartolomei May 7, 2024 11:45

github-actions bot added area/redpanda area/wasm WASM Data Transforms labels May 7, 2024

piyushredpanda approved these changes May 9, 2024

View reviewed changes

piyushredpanda merged commit 6d1636e into redpanda-data:v24.1.x May 9, 2024
16 of 19 checks passed

piyushredpanda modified the milestones: v24.1.x-next, v24.1.3 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v24.1.x] Fix timequery returning wrong offset after trim-prefix which could lead to stuck consumers #18281

[v24.1.x] Fix timequery returning wrong offset after trim-prefix which could lead to stuck consumers #18281

vbotbuildovich commented May 7, 2024

piyushredpanda commented May 9, 2024

[v24.1.x] Fix timequery returning wrong offset after trim-prefix which could lead to stuck consumers #18281

[v24.1.x] Fix timequery returning wrong offset after trim-prefix which could lead to stuck consumers #18281

Conversation

vbotbuildovich commented May 7, 2024

piyushredpanda commented May 9, 2024