Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[svc health check] allow configurable interval between checks. #5890

Conversation

jeremymv2
Copy link
Contributor

@jeremymv2 jeremymv2 commented Nov 29, 2018

🔩 Description

Resolves #5466

This change will allow users to specify a health check interval at service load time by storing a health_check_interval_seconds in the Spec file for the service such as:

$ sudo cat /hab/sup/default/specs/elasticsearch.spec
ident = "core/elasticsearch"
group = "default"
bldr_url = "https://bldr.habitat.sh"
channel = "stable"
topology = "standalone"
update_strategy = "none"
binds = []
binding_mode = "strict"
desired_state = "up"
health_check_interval_seconds = 35

Features of the PR:

  • add --health-check-interval to hab svc load ..
  • validation enforces a u32 value between 1 and 86400 seconds
  • Service or census changes will trigger all Services to re-run Health Check within 30s regardless if their configured interval is > 30s. The idea is to try to get an updated Health Check status within a reasonable time-frame from dependent services when an upstream pkg is updated.

Note I will follow up with test and doc updates, but first, I wanted to open this up for discussion before moving forward.

Related #5326
Related #5584

@christophermaier @mwrock @baumanj @raskchanky What are your thoughts on this implementation as a preliminary step to adding more usefulness to our Service Health Check feature?

Signed-off-by: Jeremy J. Miller [email protected]

✅ Checklist

  • Necessary tests added/updated?
  • Necessary docs added/updated?
  • Code actually executed?
  • Vetting performed (unit tests, lint, etc.)?

@thesentinels
Copy link
Contributor

Thanks for the pull request! Here is what will happen next:

  1. Your PR will be reviewed by the maintainers
  2. If everything looks good, one of them will approve it, and your PR will be merged.

Thank you for contributing!

@baumanj
Copy link
Contributor

baumanj commented Nov 29, 2018

I'm planning to look at this tomorrow and hope to have feedback for you by Monday.

@jeremymv2 jeremymv2 force-pushed the jeremymv2/configurable_healthcheck_interval branch from ac1627e to c1a50e8 Compare November 29, 2018 18:38
@jeremymv2
Copy link
Contributor Author

Looks like the Appveyor error was a Bldr API service unavailable.

@jeremymv2 jeremymv2 changed the title WIP [svc health check] allow configurable interval between checks. WIP [svc health check] allow configurable interval between checks. Nov 30, 2018
@baumanj
Copy link
Contributor

baumanj commented Nov 30, 2018

I re-started the AppVeyor job

Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished reading everything (will continue on Monday), but wanted to give you some early feedback. As usual, this is great work. The biggest thing so far that I'd like to see changed is all the u32s and health_check_interval_seconds to Durations and health_check_interval. There'll be some work to make Duration a type we can reference in the .proto files, but I think we can make that happen and it will be worthwhile for all the type safety that it buys.

components/hab/src/cli.rs Outdated Show resolved Hide resolved
components/hab/src/cli.rs Outdated Show resolved Hide resolved
components/hab/src/cli.rs Outdated Show resolved Hide resolved
components/hab/src/cli.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@christophermaier christophermaier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremymv2 Looking good so far, but I have some suggestions that can hopefully simplify things further. Let me know what you think.

Thanks!

components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/doc/api.raml Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
@jeremymv2
Copy link
Contributor Author

jeremymv2 commented Dec 12, 2018

@cm @baumanj Thank you for the very helpful feedback. I believe I have addressed most of the items, but I'd still like to have some discussion around the finer details of this implementation.

HealthCheckInterval is now a custom core::Service type in habitat-sh/core#95 and therefore testing this PR will require pulling down the branch for the change in core and modifying https://github.com/habitat-sh/habitat/blob/master/Cargo.toml#L26-L32 locally to build. Additionally the core PR will need to be merged first in order for us to get a ✅ build here.

Before I move further with refining this PR and testing edge cases, I'd like to get your feedback again on this intermediary point with the new commits I've just added to ensure we're all in agreement. Thank you.

Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. I'd like to simplify a few things and there may be one logic error.

components/sup-protocol/src/types.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Show resolved Hide resolved
components/sup/src/manager/service/spec.rs Show resolved Hide resolved
components/sup-protocol/protocols/types.proto Show resolved Hide resolved
@jeremymv2
Copy link
Contributor Author

jeremymv2 commented Dec 17, 2018

@christophermaier @baumanj some spare cycles popped up yesterday so I made some more progress. Please take a look at your convenience. As always, thank you!!

@jeremymv2 jeremymv2 force-pushed the jeremymv2/configurable_healthcheck_interval branch from f038c8a to 3789a76 Compare December 17, 2018 19:28
Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feedback changes look good; I think there's some core logic we can tighten up (or I'm failing to understand the complexity of the use case correctly)

components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
components/sup/src/manager/service/mod.rs Outdated Show resolved Hide resolved
@jeremymv2 jeremymv2 force-pushed the jeremymv2/configurable_healthcheck_interval branch from 3789a76 to bc3f565 Compare December 18, 2018 17:41
Copy link
Contributor

@baumanj baumanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I just had one small suggestion for nicer logging if you're so inclined, but I'm happy to approve this.

I'll plan to merge once our current release process is done.

components/sup/src/manager/service/mod.rs Show resolved Hide resolved
@jeremymv2
Copy link
Contributor Author

jeremymv2 commented Dec 20, 2018

let's kick off a rebuild now since the PR to core merged. Could somebody with permissions help out with re-running please 😄 ?

@jeremymv2
Copy link
Contributor Author

Once this merges, I will follow up with a doc update.

@baumanj
Copy link
Contributor

baumanj commented Dec 20, 2018

Hey @jeremymv2, now that habitat-sh/core#95 is merged, I'm updating the habitat repo to include it: #6009.

Once that merges, can you rebase this so that the CI passes? Then I can merge this one.

@baumanj
Copy link
Contributor

baumanj commented Dec 20, 2018

#6009 is merged now

Allows a custom health check interval per service.

Signed-off-by: Jeremy J. Miller <[email protected]>
@jeremymv2 jeremymv2 force-pushed the jeremymv2/configurable_healthcheck_interval branch from 0b33755 to 015d8d8 Compare December 20, 2018 19:42
@jeremymv2
Copy link
Contributor Author

I've rebased; however I think we need to attempt a restart of the Appveyor tests.

@baumanj
Copy link
Contributor

baumanj commented Dec 20, 2018

I've restarted the AppVeyor stuff. We should really get you access to do that yourself, @jeremymv2. You've certainly made a lot of valuable contributions.

@baumanj
Copy link
Contributor

baumanj commented Dec 21, 2018

/me spins the AppVeyor wheel of fortune again

@baumanj
Copy link
Contributor

baumanj commented Dec 21, 2018

AppVeyor timed out again. It was the packaging check, not the unit tests, so I'll force merge if it doesn't finish after one more try.

@baumanj baumanj merged commit da144ad into habitat-sh:master Dec 21, 2018
chef-ci added a commit that referenced this pull request Dec 21, 2018
Obvious fix; these changes are the result of automation not creative thinking.
@christophermaier christophermaier added Type:Feature PRs that add a new feature and removed X-feature labels Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:Feature PRs that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide config (and updated doc) for health check cadence
5 participants