Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Leak #247

Open
aaronkvanmeerten opened this issue Nov 27, 2023 · 8 comments
Open

CPU Leak #247

aaronkvanmeerten opened this issue Nov 27, 2023 · 8 comments
Labels
SDK Issue pertains to the SDK itself and not specific to any service

Comments

@aaronkvanmeerten
Copy link

When using oci-sdk versions greater than ^1.5.2, including the latest, we are seeing a slow but steady increase of CPU utilization, which eventually grows out of bounds and uses all available CPU on the instance. Reverting to ^1.5.2 fixes the issue for us. This occurs across multiple projects that leverage the oci-sdk, and can be directly attributed to the oci-sdk version, as an identical version with the older ^1.5.2 does not exhibit the leak behavior. A user of the jitsi-autoscaler (which uses oci-sdk) reported this leak to us and ran a profile, and we have confirmed the behavior ourselves but not done the profiling.

The users shows that the system is overwhelmed by timers, in case that helps you debug:

46492.8 ms43.13 % | 97726.0 ms90.67 % | (anonymous) status.js:82 |
46492.8 ms43.13 % | 97726.0 ms90.67 % | ........listOnTimeout internal/timers.js:502 |
46492.8 ms43.13 % | 97726.0 ms90.67 % | ...............processTimers internal/timers.js:482 |
43874.9 ms40.71 % | 43874.9 ms40.71 % | (anonymous) status.js:96 |
43874.9 ms40.71 % | 43874.9 ms40.71 % | ........(anonymous) status.js:94 |
43874.9 ms40.71 % | 43874.9 ms40.71 % | ...............get stats status.js:93 |
43874.9 ms40.71 % | 43874.9 ms40.71 % | ......................(anonymous) status.js:82 |
43874.9 ms40.71 % | 43874.9 ms40.71 % | .............................listOnTimeout internal/timers.js:502 |
43874.9 ms40.71 % | 43874.9 ms40.71 % | ..................................processTimers internal/timers.js:482 |
4877.3 ms4.53 % | 5978.8 ms5.55 % | (anonymous) status.js:124 |
4877.3 ms4.53 % | 5978.8 ms5.55 % | .......get stats status.js:93 |
4877.3 ms4.53 % | 5978.8 ms5.55 % | ...............(anonymous) status.js:82 |
4877.3 ms4.53 % | 5978.8 ms5.55 % | ......................listOnTimeout internal/timers.js:502 |
4877.3 ms4.53 % | 5978.8 ms5.55 % | .............................processTimers internal/timers.js:502 |

@vpeltola
Copy link

vpeltola commented Jan 11, 2024

Running into the same issue. This issue can be observed if you have a long running service like a microservice that uses oci-sdk. The CPU usage in my case was increasing about 0.1% every 10 minutes, as measured by "ps -p <pid> -o %cpu,%mem" (on a 2 CPU host) seemingly indefinitely. This growth happens despite the service not making any OCI calls.

The sdk probably starts some timer that constantly runs on the background?

Previous to this I was calling OCI APIs directly, and there was near 0% CPU usage by the server. After the switch to oci-sdk, CPU always goes to 100% after 2 days or so. I just now removed the oci-sdk from the service, and CPU usage pattern became normal, consuming only 0.x% generally and not increasing.

oci-sdk version used: 2.7.0.3
node.js: 18.18.2

@jyotisaini jyotisaini added the SDK Issue pertains to the SDK itself and not specific to any service label Jan 11, 2024
@jyotisaini
Copy link
Contributor

Thanks for reporting this @aaronkvanmeerten . We are working on the fix internally and will update here once its fixed.

@vpeltola
Copy link

vpeltola commented Jan 18, 2024

Setting the environment variable OCI_SDK_DEFAULT_CIRCUITBREAKER_ENABLED=false is a workaround to avoid this issue for now.
Docs: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/typescriptsdkconcepts.htm#typescriptsdkconcepts_topic_Retry_Circuit_Breakers

@JoshuaWR
Copy link
Member

Hi @vpeltola, this issue seems to be caused by circuit breakers not shutting down after they are no longer needed. The most recent release of the SDK includes a method in each client that the user can call to shut down these circuit breakers as needed. Please see this example. Let us know if this seems to fix your issue, thanks!

@vpeltola
Copy link

Hmm, the solution shouldn't be to shutdown something (circuit breakers) that I didn't start in the first place. If they were automatically started without the user's knowledge, they should also shutdown automatically. And if/while they are running, they should not leak memory and use progressively more CPU. I think there is still a bug that needs fixing.

@aaronkvanmeerten
Copy link
Author

Hmm, the solution shouldn't be to shutdown something (circuit breakers) that I didn't start in the first place. If they were automatically started without the user's knowledge, they should also shutdown automatically. And if/while they are running, they should not leak memory and use progressively more CPU. I think there is still a bug that needs fixing.

I agree completely with the sentiment. No other library I have ever used has required me to run extra code to shut down pieces in order to not leak CPU. Something is clearly wrong in this library. Especially because it did not happen before a certain version, I believe it must be some kind of bug that needs fixing.

@JoshuaWR
Copy link
Member

Hi @vpeltola @aaronkvanmeerten, thank you for your feedback.
For all OCI SDKs, including TypeScript, we've decided to use circuit breakers in our clients by default. This helps prevent overloading OCI services during partial service outages, to improve availability.
There isn't an easy way for us to tell when the user is done with a TypeScript SDK client they've created. A client can be created, used to make a call, and then not needed. Or, it may need to be left open perpetually to be used for a number of API calls over time. Since we don't have a good way to know when a user is done with the client ourselves, we think its best if the user closes the client themselves (by calling .shutdownCircuitBreaker()) when they know they're done using it.
To address @aaronkvanmeerten's comment specifically, manually closing a client that has been manually created is not uncommon across libraries. For example, streaming libraries such as grpc for TS ask users to create a streaming object, use it, and then manually close it with stream.cancel() to ensure it doesn't continue to use resources.
If you have suggestions as to how we could better handle circuit breakers in the SDK, we welcome them, as we do recognize this is an extra step the user has to take. Thanks!

@JoshuaWR
Copy link
Member

JoshuaWR commented Apr 9, 2024

As part of the latest Typescript release, .close() has been added to each client to further address this issue, and to more closely resemble the behavior of clients from the Java SDK. In addition, this method's use is now shown in each of the typescript examples that use clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SDK Issue pertains to the SDK itself and not specific to any service
Projects
None yet
Development

No branches or pull requests

4 participants