Supervisor Automated Testing #4729

christophermaier · 2018-03-09T13:56:15Z

We need a robust test suite to exercise the Supervisor. We used to have one, but rapid evolution in the Supervisor itself made it difficult to maintain. As the Supervisor nears a stable 1.0 status, it's time to buckle down and set up a comprehensive suite.

We should be doing out-of-process testing here; the tests should be black-boxes with respect to the implementation details of the Supervisor itself. Performing the tests from outside the Supervisor (as opposed to extensively mocked tests from within the Supervisor code itself) is the best way to achieve truly meaningful tests.

Current thinking is that we'll create a small app / framework to set up and coordinate these tests. Using containers seems like a useful implementation to pursue, which should make setting up multi-Supervisor rings easier.

Whenever possible, we should verify our testing expectations by probing the actual behavior of the Supervisor / the services in question. Verifying things like filesystem state can be useful in some cases, but it does not provide a complete picture; asserting that a redis.spec file was written to disk is useless if a Redis service isn't running at the end of your test case. In any case, filesystem state can be seen as an implementation detail, and as such, should be used minimally, if at all.

To help with this, it will be useful to create one or more "probe" services to use in these tests. This will be a simple application in a Habitat package, constructed in such a way that it can be easily probed by our tests to verify various Supervisor operations. A small HTTP server would be ideal, since that will provide an easy interface for external processes (i.e., our testing framework) to use to verify expectations. Did a configuration file get updated properly based on that configuration rumor we just sent? Hit the service's HTTP /config endpoint and verify that it changed in the right way. These probe services should have a full complement of hooks, and be set up in such a way that all relevant lifecycle changes of the service can be introspected from outside. We may be able to get away with creating a single probe service, or we may need to create several, depending on how generally we can design it.

Our test framework should have enough primitive operations to completely and concisely exercise the Supervisor. For instance, it should be easy to start and stop a Supervisor. It should be easy to simulate networking issues between nodes (e.g., introducing lag between nodes, dropping a certain percentage of packets between nodes, completely severing network connectivity between nodes, etc.). These network manipulation primitives will be useful for testing rumor propagation, leader election and failover, ring stabilization following netsplits, and more. Having a way to simulate a Builder instance that the test Supervisors can talk to would be useful for testing service and Supervisor update strategies, something that is extremely difficult to do currently.

We are agnostic as to how such a testing framework is concretely implemented. Build it based on Cucumber + Aruba, use delmo, construct something from scratch; anything is fair game. It should adhere to the broad principles stated above, though.

While a central focus of this is a thorough testing of the Supervisor's behavior under a variety of conditions, we should also take the opportunity to extend testing to everything else the hab binary is capable of doing, from basic operations like generating a new key, to creating a new exported artifact. See #4642 for more.

Aha! Link: https://chef.aha.io/features/APPDL-40

The text was updated successfully, but these errors were encountered:

christophermaier · 2018-03-09T13:58:36Z

cc: @elliott-davis @baumanj @raskchanky @reset @fnichol

baumanj · 2018-03-09T14:58:16Z

This is great! Thanks, @christophermaier

stale · 2020-04-03T00:10:36Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

stale · 2021-05-08T04:46:37Z

This issue has been automatically closed after being stale for 400 days. We still value your input and contribution. Please re-open the issue if desired and leave a comment with details.

christophermaier added the Epic label Mar 9, 2018

christophermaier added V-sup and removed Epic labels Mar 12, 2018

christophermaier mentioned this issue Mar 12, 2018

Add QuickCheck Tests #1847

Closed

christophermaier added Epic and removed E-epic labels Jun 28, 2018

prasek changed the title ~~Testing the Supervisor~~ Supervisor Automated Testing Dec 6, 2018

stale bot added the Stale label Apr 3, 2020

stale bot closed this as completed May 8, 2021

trevorghess removed Stale V-sup labels May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervisor Automated Testing #4729

Supervisor Automated Testing #4729

christophermaier commented Mar 9, 2018 •

edited by trevorghess

Loading

christophermaier commented Mar 9, 2018

baumanj commented Mar 9, 2018

stale bot commented Apr 3, 2020

stale bot commented May 8, 2021

Supervisor Automated Testing #4729

Supervisor Automated Testing #4729

Comments

christophermaier commented Mar 9, 2018 • edited by trevorghess Loading

christophermaier commented Mar 9, 2018

baumanj commented Mar 9, 2018

stale bot commented Apr 3, 2020

stale bot commented May 8, 2021

christophermaier commented Mar 9, 2018 •

edited by trevorghess

Loading