Skip to content
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.

Use more reliable CPU measurements #76

Open
howardjohn opened this issue Dec 3, 2021 · 0 comments
Open

Use more reliable CPU measurements #76

howardjohn opened this issue Dec 3, 2021 · 0 comments

Comments

@howardjohn
Copy link

Current situation

Currently, CPU measurements are done by looking at the peak CPU usage over a period of time. I would argue that this is the wrong metric to look at.

Consider a case where we need to process 20 'things'. There are many different timings which this could be done. For example:

2021-03-12_10-36-31

In this case, both processes are doing the same amount of work, and using the same amount of CPU total. However, when looking at peak CPU, the 'spike' line will be reported as using 10x CPU as the other.

The 'spike' process could have easily just put cpu: limit: 10m on its pod definition (or done application level throttling) and gotten the same behavior. However, typically applications will not do this (or at least not this aggressively) as the spiking behavior is actually desired - if the node has sufficient CPU available, why intentionally slow things down?

In the case data plane, throttling will very likely have a latency impact which would have balance this out a bit, but not entirely. For example, if there are large configuration changes at the start or end of the test, the data plane CPU may have a brief spike to process this configuration which would lead to a high max CPU reported, despite the lower CPU spent during the other 99% of the test.

On the control plane side, this is even more skewed, as the speed of the control plane is not measured at all in this test. The test could be 'gamed' by just setting an absurdly low CPU limit. Because the test is not benchmarking speed of configuration propogation or other control plane speeds, this would show up strictly as an improvement.

Note I think this mostly applies to CPU. For memory its probably a pretty reasonable metric, and memory is much less likely to be spikey.

Impact

Benchmark does not align with real world expectations

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant