Issues with increased write latency at a certain time during the day #8665
Unanswered
lukas-unity
asked this question in
Help and support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey! We've been having issues with ingesting samples for the last couple of weeks from couple of our tenants everyday at specific time period starting from around 06:00 AM PT until 11:00 AM PT
during this time we don't see anything too suspicious in the Mimir Overview dashboard, it's only the sample rate from:
sum(rate(cortex_distributor_received_samples_total{namespace="kronus"}[$__rate_interval])) by (user)
metric that gets uneven.
Majority of our tenants seem to be fine, however we have couple of large tenants and couple smaller ones that have this issue, interestingly, we have two clusters with a similar metric load, but only one of them is having this problem.
Prometheus WAL delay/ Write Latency increases significantly and Shards go to the max value set
We're running Mimir 2.12 and our tenant Prometheuses are on 2.48-2.53
Any tips where else could we look? we've already tried upgrading Mimir (we had 2.11 previously), increased distributor and ingester count, added more resources, but so far no luck.
Beta Was this translation helpful? Give feedback.
All reactions