Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervisor Automated Testing #4729

Closed
christophermaier opened this issue Mar 9, 2018 · 4 comments
Closed

Supervisor Automated Testing #4729

christophermaier opened this issue Mar 9, 2018 · 4 comments
Labels

Comments

@christophermaier
Copy link
Contributor

christophermaier commented Mar 9, 2018

We need a robust test suite to exercise the Supervisor. We used to have one, but rapid evolution in the Supervisor itself made it difficult to maintain. As the Supervisor nears a stable 1.0 status, it's time to buckle down and set up a comprehensive suite.

We should be doing out-of-process testing here; the tests should be black-boxes with respect to the implementation details of the Supervisor itself. Performing the tests from outside the Supervisor (as opposed to extensively mocked tests from within the Supervisor code itself) is the best way to achieve truly meaningful tests.

Current thinking is that we'll create a small app / framework to set up and coordinate these tests. Using containers seems like a useful implementation to pursue, which should make setting up multi-Supervisor rings easier.

Whenever possible, we should verify our testing expectations by probing the actual behavior of the Supervisor / the services in question. Verifying things like filesystem state can be useful in some cases, but it does not provide a complete picture; asserting that a redis.spec file was written to disk is useless if a Redis service isn't running at the end of your test case. In any case, filesystem state can be seen as an implementation detail, and as such, should be used minimally, if at all.

To help with this, it will be useful to create one or more "probe" services to use in these tests. This will be a simple application in a Habitat package, constructed in such a way that it can be easily probed by our tests to verify various Supervisor operations. A small HTTP server would be ideal, since that will provide an easy interface for external processes (i.e., our testing framework) to use to verify expectations. Did a configuration file get updated properly based on that configuration rumor we just sent? Hit the service's HTTP /config endpoint and verify that it changed in the right way. These probe services should have a full complement of hooks, and be set up in such a way that all relevant lifecycle changes of the service can be introspected from outside. We may be able to get away with creating a single probe service, or we may need to create several, depending on how generally we can design it.

Our test framework should have enough primitive operations to completely and concisely exercise the Supervisor. For instance, it should be easy to start and stop a Supervisor. It should be easy to simulate networking issues between nodes (e.g., introducing lag between nodes, dropping a certain percentage of packets between nodes, completely severing network connectivity between nodes, etc.). These network manipulation primitives will be useful for testing rumor propagation, leader election and failover, ring stabilization following netsplits, and more. Having a way to simulate a Builder instance that the test Supervisors can talk to would be useful for testing service and Supervisor update strategies, something that is extremely difficult to do currently.

We are agnostic as to how such a testing framework is concretely implemented. Build it based on Cucumber + Aruba, use delmo, construct something from scratch; anything is fair game. It should adhere to the broad principles stated above, though.

While a central focus of this is a thorough testing of the Supervisor's behavior under a variety of conditions, we should also take the opportunity to extend testing to everything else the hab binary is capable of doing, from basic operations like generating a new key, to creating a new exported artifact. See #4642 for more.

Aha! Link: https://chef.aha.io/features/APPDL-40

@christophermaier
Copy link
Contributor Author

@baumanj
Copy link
Contributor

baumanj commented Mar 9, 2018

This is great! Thanks, @christophermaier

@prasek prasek changed the title Testing the Supervisor Supervisor Automated Testing Dec 6, 2018
@stale
Copy link

stale bot commented Apr 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

@stale stale bot added the Stale label Apr 3, 2020
@stale
Copy link

stale bot commented May 8, 2021

This issue has been automatically closed after being stale for 400 days. We still value your input and contribution. Please re-open the issue if desired and leave a comment with details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants