Skip to content
This repository has been archived by the owner on Oct 17, 2018. It is now read-only.

Acknowledgement based mechanism to mark metric timestamps as completed #147

Open
xichen2020 opened this issue Jul 14, 2018 · 0 comments
Open

Comments

@xichen2020
Copy link
Contributor

cc @cw9

I mentioned this issue to you a while back. Basically the aggregator currently marks a timestamp as "flushed" and persist that timestamp in KV as soon as the metrics with that timestamp have been flushed to either the backends (m3msg ingesters/indexers/etc) or written out to the TCP connection to other aggregation servers as forwarded metrics. However, without acknowledgements there's no reliable way to know for sure whether the metrics have made their way to the receiver end and as such marking tiles as completed can be premature and in turn cause the followers to discard metrics too early and can cause data loss during server deployments.

With the integration of m3msg into m3aggregator, this should be an achievable goal. Basically when a timestamp is flushed, the timestamp should not be marked as completed until the metrics associated with that timestamp have been acked on the other side (or dropped locally due to buffer full) so we can mark metrics as written with confidence.

In the short term, a workaround to mitigate the issue for forwarded metrics could be for the follower to use lastFlushedNanos - maxSingleDelay as the target timestamp to discard its metrics, as for forwarded metrics they would be rejected after maxSingleDelay anyway. Nonetheless, this is certainly not ideal, and using m3msg based acks would be a much cleaner solution.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant