Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.1.x] Fix timequery returning wrong offset after trim-prefix which could lead to stuck consumers #18281

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #18112

Encapsulates common operations on offset intervals. For now, although it
is named bounded, the maximum offset can still be set to
`model::offset::max()`. I will likely change this in the future as it
requires changing quite a bit of call sites, most likely only tests.

This little data structure tries to be very light weight and impose
minimum overhead on basic interval operations like intersection or
inclusion. It is also quite hard to use it incorrectly due to the
checked construction variant and debug assertions.

Later, we might want to refactor things like log_reader to use this
instead of min and max offsets like they do today. Once that is done,
the checked variant needs to be called only once at the kafka layer. For
everyone else it becomes a ~0 cost abstraction.

(cherry picked from commit f13bfa6)
kafka ListOffsets request allows to query offsets by timestamps. The
result of such a request is the first kafka offset in the log that has a
timestamp equal to or greater than the requested timestamp. Or, -1 if no
such record can be found.

The implementation we have today assumes that the start of the physical
log matches the start of the log as it is seen by external users/kafka
clients.

However, this is not always true. In particular, when [trim-prefix][1]
(prefix truncation) is used. There are 2 sources of problems:

  1) trim-prefix is synchronously applied at cluster layer where it
     changes the visibility of the log from the client point-of-view,
     but it is asynchronously applied to the consensus log/physical log/
     disk_log_impl class, cloud storage.

  2) trim-prefix is executed with an offset of a record that in in the
     middle of a batch.

As a result, in these scenarios, if the clients sends a kafka Fetch
request with the received offset they'll be replied with
OffsetOutOfRange error.

This commit changes such queries are implemented at the lower levels
of the system by carrying the information about the visible start and
end of the log together with the timestamp. Then, at the lower levels,
we use these to limit our search only to that range.

Although this commit does not change the implementation of the tiered
storage timequery it does fix the trim-prefix problem there too in the
general case because of check and "bump" added in
redpanda-data#11579.

Tiered Storage timequeries have some additional problems which I plan to
address in redpanda-data#18097.

[1]: https://docs.redpanda.com/current/reference/rpk/rpk-topic/rpk-topic-trim-prefix/

(cherry picked from commit 76a1ea2)
Previous code contained a bug which is masked by the retry logic in
replicated partition:

    const auto kafka_start_override = _partition->kafka_start_offset_override();
    if (
      !kafka_start_override.has_value()
      || kafka_start_override.value() <= res.value().offset) {
        // The start override doesn't affect the result of the timequery.
        co_return res;
    }
    vlog(
      klog.debug,
      "{} timequery result {} clamped by start override, fetching result at "
      "start {}",
      ntp(),
      res->offset,
      kafka_start_override.value());

    ...

(cherry picked from commit f9ed5ca)
Not easy to test that this is right so not going to for now.

(cherry picked from commit 8f2de96)
@vbotbuildovich vbotbuildovich added this to the v24.1.x-next milestone May 7, 2024
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label May 7, 2024
@github-actions github-actions bot added area/redpanda area/wasm WASM Data Transforms labels May 7, 2024
@piyushredpanda
Copy link
Contributor

Known failures: #15312 and #14898

@piyushredpanda piyushredpanda merged commit 6d1636e into redpanda-data:v24.1.x May 9, 2024
16 of 19 checks passed
@piyushredpanda piyushredpanda modified the milestones: v24.1.x-next, v24.1.3 May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/wasm WASM Data Transforms kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants