Use more reliable CPU measurements #76

howardjohn · 2021-12-03T18:43:42Z

Current situation

Currently, CPU measurements are done by looking at the peak CPU usage over a period of time. I would argue that this is the wrong metric to look at.

Consider a case where we need to process 20 'things'. There are many different timings which this could be done. For example:

In this case, both processes are doing the same amount of work, and using the same amount of CPU total. However, when looking at peak CPU, the 'spike' line will be reported as using 10x CPU as the other.

The 'spike' process could have easily just put cpu: limit: 10m on its pod definition (or done application level throttling) and gotten the same behavior. However, typically applications will not do this (or at least not this aggressively) as the spiking behavior is actually desired - if the node has sufficient CPU available, why intentionally slow things down?

In the case data plane, throttling will very likely have a latency impact which would have balance this out a bit, but not entirely. For example, if there are large configuration changes at the start or end of the test, the data plane CPU may have a brief spike to process this configuration which would lead to a high max CPU reported, despite the lower CPU spent during the other 99% of the test.

On the control plane side, this is even more skewed, as the speed of the control plane is not measured at all in this test. The test could be 'gamed' by just setting an absurdly low CPU limit. Because the test is not benchmarking speed of configuration propogation or other control plane speeds, this would show up strictly as an improvement.

Note I think this mostly applies to CPU. For memory its probably a pretty reasonable metric, and memory is much less likely to be spikey.

Impact

Benchmark does not align with real world expectations

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use more reliable CPU measurements #76

Use more reliable CPU measurements #76

howardjohn commented Dec 3, 2021

Use more reliable CPU measurements #76

Use more reliable CPU measurements #76

Comments

howardjohn commented Dec 3, 2021

Current situation

Impact