-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for minimal reboot period #904
Open
leonnicolas
wants to merge
5
commits into
kubereboot:main
Choose a base branch
from
leonnicolas:min-reboot-period
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
12e3818
Add option for minimal reboot period
leonnicolas 3fb674f
Add suggestions from code review
leonnicolas 234cee6
Log the next allowed reboot time
leonnicolas a4d71f6
Merge branch 'main' into min-reboot-period
leonnicolas c4e56b9
Use blockers
leonnicolas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -100,3 +100,4 @@ spec: | |
# - --metrics-host="" | ||
# - --metrics-port=8080 | ||
# - --concurrency=1 | ||
# - --min-reboot-period=336h |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to log the next time that a reboot will be allowed. So maybe we convert lastSuccessfulRebootWithinMinRebootPeriod() into a "getNextAllowedRebootTime()" func, and then do something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example confuses me. Why add
minRebootPeriod
to the value returned bygetNextAllowedRebootTime
?Either way returning the next allowed reboot time (without adding the minRebootPeriod) or returning the last successful reboot time (and adding minRebootPeriod) sound good to me.
But what should we do if we don't know the last reboot time? Right now we just return "you should reboot" when there is no last successful reboot time. Would the getNextAllowedRebootTime just return
&time.Time{}
if we don't know about the last successful reboot?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, copy/paste error in example (updated!).
In the instance where we aren't able to determine the last reboot time we can probably return time.Now(), and then update the statement to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you ok with returning and error explicitly? Like
So we don't have to return time.Now() when we don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example we could do
if we don't care about logging the duration (in how much time can we reboot), we should use
time.Now().Before(t)
instead, I guess. I am really bad with subtracting times in my head.I see that it would work without returning an error. It would even take less code, but the users would never see the warning about a missing annotation or anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with an error, but in the case of the annotation not being found, that's not actually an error, as this is a new feature and plenty of users won't configure kured to use it. So silently ignoring that and returning time.Now() seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would only check the annotation if
minRebootPeriod
is configured to be larger than 0.Right, if you suddenly turn on the feature you would get one "wrong" error log entry before the first reboot. After the first reboot with the
--minRebootPeriod
flag, a missing annotation would actually be a real error.If we don't care so much about logging that error, I will totally agree to returning
time.Now()
if something unexpected happens.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify why this is so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You run cured without the
--min-reboot-period
flag. Kured will not annotate the node with theweave.works/kured-last-successful-reboot
flag. Now you want to use the "new" feature, edit your manifest and roll out the new kured daemon set. Normally none of the pods will hold the lock fromkured/cmd/kured/main.go
Line 666 in d0bdc11
weave.works/kured-last-successful-reboot
annotation to its nodes. We wouldn't want that anyways because then restarting/rollilng out kured daemon set would falsify the last successful reboot time. Since the feature is enabled now, kured would check for theweave.works/kured-last-successful-reboot
annotation, but fail since it is not there yet. So if we log an error when the annotation is missing, we would see one error log for each tick until the first reboot. That the annotation is missing is expected before the first reboot. However, after the first reboot the missing annotation would be an unexpected error.Only after the first reboot with the
--min-reboot-period
flag, this feature would actually work.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is right here. Should we never log an error on missing annotations to avoid confusion before the first reboot to the cost of silencing real errors after the first reboot? Or the other way around? (I added a commit that would log the errors, happy to change it though)