Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubernetes-mixin runbooks #11

Closed
wants to merge 38 commits into from

Conversation

nvtkaszpir
Copy link
Contributor

@nvtkaszpir nvtkaszpir commented Feb 17, 2022

fixes #8 and a lot of other dead urls.

todo:
fix references between files.

@netlify
Copy link

netlify bot commented Feb 17, 2022

✔️ Deploy Preview for distracted-northcutt-e0bccc ready!

🔨 Explore the source changes: c9a73f4

🔍 Inspect the deploy log: https://app.netlify.com/sites/distracted-northcutt-e0bccc/deploys/620fbc637e7ed50008f25a71

😎 Browse the preview: https://deploy-preview-11--distracted-northcutt-e0bccc.netlify.app

@nvtkaszpir nvtkaszpir changed the title Create kubedeploymentreplicasmismatch Create kube deployment replicas mismatch Feb 17, 2022
@nvtkaszpir nvtkaszpir changed the title Create kube deployment replicas mismatch Add some Kube* runbooks Feb 17, 2022
@nvtkaszpir nvtkaszpir marked this pull request as draft February 17, 2022 15:54
@nvtkaszpir nvtkaszpir changed the title Add some Kube* runbooks Add kubernetes-mixin runbooks Feb 17, 2022
@nvtkaszpir nvtkaszpir marked this pull request as ready for review February 17, 2022 19:36
Some are with TODO, though.
@nvtkaszpir
Copy link
Contributor Author

I guess certain sections should be extracted and added to general section, especially Pod debugging.

@nvtkaszpir nvtkaszpir marked this pull request as draft February 17, 2022 21:53
@nvtkaszpir
Copy link
Contributor Author

@paulfantom ping

@@ -1,5 +1,5 @@
---
title: Alertmanager ConfigInconsistent
title: Alertmanager Config Inconsistent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about keeping it consistent with alert name in prometheus and removing spaces instead of adding them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah was wondering about it:

  • with spaces - page title with spaces will be on the left side of the web page and it is easier to read
  • without spaces - some alert names are really long and this may look ugly there.

On the other hand first header in the page is left as is.

Also I will check if searching for alerts from the search bar works with names with spaces and without it.
I would rather keep it with spaces and hid somwhere the name without spaces :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing entries without spaces makes them impossible to find via such short alert name, which is a bit problematic.

Will have to look into the Hugo options or something.


## Meaning

Given container in the pod is throttled to avoid excessive CPU usage.
Copy link
Member

@paulfantom paulfantom Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clearly state that this alert is just informative and user shouldn't increase CPU limits unless the application is behaving erratically (another alert firing). For this particular reason, the alert is inhibited by default in kube-prometheus and can be sent only if another alert in the same namespace is firing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should link to kubernetes-monitoring/kubernetes-mixin#108 for more curious folks?

Copy link
Contributor Author

@nvtkaszpir nvtkaszpir Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in e30e62a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice there will be a lot of such entries, unfortunately

Copy link
Member

@paulfantom paulfantom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great work!!! 🎉
I've reviewed only till KubePersistentVolumeErrors runbook and I will resume in next days.

Few generic nits:

  • Since "Service degradation or unavailability." is very vague I would like to refrain from using it as an Impact and it would be good to specify what is the direct consequence. I've put a few suggestions on what I mean by it.
  • When linking to some concept, like "APIServer aggregation", let's maybe put those links between <details></details> in the "Meaning" section. WDYT?
  • If we create issues for TODO sections, we can increase the visibility of what needs to be done.

content/runbooks/kubernetes/CPUThrottlingHigh.md Outdated Show resolved Hide resolved
content/runbooks/kubernetes/CPUThrottlingHigh.md Outdated Show resolved Hide resolved

## Meaning

Given container in the pod is throttled to avoid excessive CPU usage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should link to kubernetes-monitoring/kubernetes-mixin#108 for more curious folks?

content/runbooks/kubernetes/KubeAPIDown.md Outdated Show resolved Hide resolved
content/runbooks/kubernetes/KubeCPUOvercommit.md Outdated Show resolved Hide resolved
content/runbooks/kubernetes/KubeMemQuotaOvercommit.md Outdated Show resolved Hide resolved
content/runbooks/kubernetes/KubeCPUQuotaOvercommit.md Outdated Show resolved Hide resolved
content/runbooks/kubernetes/KubePersistentVolumeErrors.md Outdated Show resolved Hide resolved
content/runbooks/kubernetes/KubePersistentVolumeErrors.md Outdated Show resolved Hide resolved
nvtkaszpir and others added 24 commits February 18, 2022 16:03
@nvtkaszpir
Copy link
Contributor Author

After slack talk it is better to close it and split into smaller commits.

@nvtkaszpir nvtkaszpir closed this Feb 18, 2022
@SennaSemakula
Copy link

@nvtkaszpir Do we currently have any PRs open to fix the dead links. Still seeing https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubecontainerwaiting and many other alerts

@nvtkaszpir
Copy link
Contributor Author

yeah, AFAIR it was not merged yet

@SennaSemakula
Copy link

Hi @paulfantom is there any update on this?

junotx pushed a commit to junotx/runbooks that referenced this pull request Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing docs for kubernetes/kubehpamaxedout
3 participants