You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I've been running a particular model in Kubernetes using Cog. Whenever we have high workloads (4-5 prediction in queue) the Cog model seems to be stopping without notifying the reason. We initially thought this was a memory issue, however upon further investigation we found that we still have plenty of memory left for it to be an issue.
It would be great if you could provide any hypothesis on this issue, looking forward to be following them.
Here's an example of the log, keep it mind that we have multiple replicas running and we are displaying logs on every pods.
Note: There's no presence of cog.server.runner exception logs at all, just plain shutdown by cog http
The text was updated successfully, but these errors were encountered:
tontan2545
changed the title
COG Container suddenly stopping without explicit reason
Container suddenly stopping without explicit reason
May 8, 2024
Hi, I've been running a particular model in Kubernetes using Cog. Whenever we have high workloads (4-5 prediction in queue) the Cog model seems to be stopping without notifying the reason. We initially thought this was a memory issue, however upon further investigation we found that we still have plenty of memory left for it to be an issue.
It would be great if you could provide any hypothesis on this issue, looking forward to be following them.
Here's an example of the log, keep it mind that we have multiple replicas running and we are displaying logs on every pods.
Note: There's no presence of
cog.server.runner
exception logs at all, just plain shutdown by cog httpThe text was updated successfully, but these errors were encountered: