-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concept for monitoring OpenShift 4 #20
Comments
RFC: @srueg @tobru @bliemli @madchr1st |
Generally I really like it and I think it's pretty much what we want!
Why? Some other questions:
|
Regarding these questions also see https://docs.openshift.com/container-platform/4.5/release_notes/ocp-4-5-release-notes.html#ocp-4-5-monitor-your-own-services-tp. |
This is a technology preview and do not use them. |
The Prometheus Operator of the Cluster Monitoring only takes care of its own. We would have to bring our own Operator. Our own operator would then again only take care of the namespace we place our Prometheus in. This is for the same reasons why the Cluster Monitoring operator is limited to only one namespace. We can do so by using OLM or bring in the operator by other means (e.g. kube-prometheus). |
Yes, Cluster Monitoring only watches a limited set of namespaces. I need to check which ones. We would need to do the same. Preferably we do this by labelling namespaces (need to check if that is possible). |
IMHO it is possible to configure which namespaces a Prometheus Operator watches for ServiceMonitor resources using the |
We will for sure make confiugre persistent storage (default is empty dir). Disk size and memory requests/limits needs to be defined on a per cluster level based on actual usage. Default retention time is 10 days. We might want to change that but I do not thinks so. |
That for sure will work. If possible, I would prefer a label based approach. |
A specific label will have to be applied on the ServiceMonitor and PrometheusRule resources. |
OpenShift 4 includes a cluster monitoring based on Prometheus. This ticket aims to answer the question: how do we make use of this.
Motivation
The documentation about Configuring the monitoring stack lists quite a lot of things that can not be configured. This includes:
We know from experience with OpenShift 3.11: some tweaking will required at some point. This includes adding ServiceMonitors for things not (yet) covered by Cluster Monitoring, adding new rules to cover additional failure scenarios and altering rules that are noisy and or not actionable.
Goals
Enable us to:
Non-Goals
Answer the question where alerts are being sent to and thus how the are being acted upon.
Design Proposal
Based on all those restrictions, one could conclude to omit the Cluster Monitoring and do it on your own. This would give full control over everything. But the Cluster Monitoring is a fundamental part of an OpenShift 4 setup and will always be present. It is required for certain things to work properly. The result of doing all again, would be a huge waste of resources both in terms of management/engineering as well as compute and storage resources.
For that reason we will make use of Cluster Monitoring as much as possible. We will operate a second pair of Prometheus instances in parallel to the Cluster Monitoring ones. Yet that second pair of instances only take care of the things we can not do with the Cluster Monitoring.
Those additional Prometheus instance will get the needed metrics from Cluster Monitoring. Targets are only scraped directly, when cluster Monitoring not already is doing so. Alerts will be sent to the Alertmanager instances of the Cluster Monitoring.
User Stories
Noisy and or non actionable alert rule
The Configuring the monitoring stack documentation explicitly prohibits changing of the existing alert rules. From our experience with OpenShift 3.11, we had cases where we were in need to do do so. The reasons being a rule that just produced noise, was not actionable and or did not cover for some edge cases.
The OpenShift 4 monitoring is based on kube-prometheus. We have also experience with this as we are using it for non OpenShift Kubernetes clusters. We also already had to tweak some of those rules. See CPUThrottlingHigh false positives for an example.
For those cases, we can make use of Alertmanager. With the routing configuration we route those troublesome alerts to the void. The second set of Prometheus will then evaluate a replacement alert rule.
Service not monitored
The Configuring the monitoring stack documentation explicitly prohibits creation of additional ServiceMonitors with the cluster monitoring. Instead we will use our second set of Prometheus to scrape metrics from those services. Rules based on those metrics will also be evaluated there.
Failure scenario not covered by existing alert rules
The Configuring the monitoring stack explicitly prohibits creation of additional alert rules.
Additional alert rules will be configured and evaluated on our second set of Prometheus instances. The metrics will come from Cluster Monitoring and or from targets directly scraped.
Custom Dashboards
The Configuring the monitoring stack documentation explicitly prohibits changing of the Grafana instance. In order to have custom dashboards, we can operate our own Grafana instance which uses our second set of Prometheus as its data source.
Implementation Details/Notes/Constraints
Our own pair of Prometheus instances will use remote read to query metrics from the Cluster Monitoring. This does not create additional replicas of the metrics. No additional storage is needed except for the additional targets scraped. The remote read is also efficient on memory usage (see Remote Read Meets Streaming).
Risks and Mitigations
This setup mitigates all the configuration restrictions of Cluster Monitoring. It also does so with non or only a minimal resource overhead.
The remote read is a source of failure usually not present and has to be accounted for.
See Remote Read Meets Streaming for an in depth discussion on the subject.
Drawbacks
This setup up is specific to OpenShift 4. It can not—or not without major change—be applied to non OpenShift 4 setups.
Alternatives
Remote write
The OpenShift 4 documentation does not mention it but in the source we see that remote write targets can be configured. Prometheus itself does not provide a receive endpoint instead Thanos Receiver could be used.
Thanos Reciever writes the received metrics in the same format as Prometheus does. It is possible to point a Prometheus instance to the same data directory and thus "import" it to Prometheus. While this works technically, this is probably not save for production. Instead, remote read or Thanos Ruler must be used.
So, this is less an alternative but a complement to achieve long term storage.
Cluster Monitoring will be configured to write metrics into a Thanos Receiver. The receiver then stores those metrics into S3. With Thanos Querier, those metrics again, will then be made available to Prometheus using remote read and also to Grafana.
Federation
Federation allows a Prometheus server to scrape selected time series from another Prometheus server. The key word here is selected.
It is possible to use the federation endpoint to scrape all metrics. This has several downsides.
Federation is meant to built aggregated view in a hierarchical architecture. It is not built to bring most if not all metrics from one Prometheus instance to another.
References
The text was updated successfully, but these errors were encountered: