-
-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plunk API Fails Periodically - Self Hosted #114
Comments
Not sure if it might be related, but here are logs from Postgres DB: 2024-10-16T14:40:47.565962634Z 2024-10-16 14:40:47.565 UTC [884] FATAL: role "postgres" does not exist |
Just a bit more context:
|
Not sure if possible @driaug , but adding an api health route would be very useful in the mean time? If the container crashes, we could use it to restart again. At the moment I am using:
Before I was only checking http://127.0.0.1:3000, but this would give a false positive as only the dashboard would be running. |
`PLUNK_API_URI` is a placeholder for `NEXT_PUBLIC_API_URI` inside the Dockerfile. `API_URI` variable could be an internal URI like `http://plunk:3000` while `NEXT_PUBLIC_API_URI` has to be public. Separating internal and public URI variables can help with performance by avoiding network overhead for server requests. This commit also solves the issue useplunk#114.
I second adding an healthcheck route. I'm using caprover to deploy - here's my captain-definition/one-click-app file for ref. I think the issue stems from ipv6. I added the env var edit: I can verify that the node options above fixed this issue, please test. @ejscheepers @driaug |
@ardasevinc does the new healthcheck route work for you? should be available in the latest version |
I think I can't change the healtcheck in caprover after it's been deployed. Will test later. For what this issue concerns, I have solved it for my case with the |
I've also got random errors with the 2 flags set in NODE_OPTIONS env var :
|
Seems to be related to https://r1ch.net/blog/node-v20-aggregateeerror-etimedout-happy-eyeballs |
that's odd. have you tried recreating the container? maybe force rebuilding. I'm self-hosting plunk via caprover and |
`PLUNK_API_URI` is a placeholder for `NEXT_PUBLIC_API_URI` inside the Dockerfile. `API_URI` variable could be an internal URI like `http://plunk:3000` while `NEXT_PUBLIC_API_URI` has to be public. Separating internal and public URI variables can help with performance by avoiding network overhead for server requests. This commit also solves the issue useplunk#114.
this solution lasted 2-3 weeks. it seems the issue has to do with amount of requests. when the plunk api is frequently used this failing period shortens. I did not get the ECONNREFUSED, ETIMEDOUT or ETIMEDOUT errors this time though. Seems like the last option I tried fixed those. Will investigate further.
related: https://undici.nodejs.org/#/?id=network-address-family-autoselection |
I've been trying another fix since ~2 weeks. Seems working for now, no crashes. The fix is:
I haven't had any crashes or errors in the logs since I implemented these fixes. |
Every now and again the API fails and does not restart.
Server Logs:
ode:internal/deps/undici/undici:13185
2024-10-16T04:29:01.190454265Z Error.captureStackTrace(err);
2024-10-16T04:29:01.190459825Z ^
2024-10-16T04:29:01.190463705Z
2024-10-16T04:29:01.190467185Z TypeError: fetch failed
2024-10-16T04:29:01.190470865Z at node:internal/deps/undici/undici:13185:13
2024-10-16T04:29:01.190474745Z at process.processTicksAndRejections (node:internal/process/task_queues:105:5) {
2024-10-16T04:29:01.190478825Z [cause]: AggregateError [ETIMEDOUT]:
2024-10-16T04:29:01.190482625Z at internalConnectMultiple (node:net:1122:18)
2024-10-16T04:29:01.190486185Z at internalConnectMultiple (node:net:1190:5)
2024-10-16T04:29:01.190489785Z at Timeout.internalConnectMultipleTimeout (node:net:1716:5)
2024-10-16T04:29:01.190493465Z at listOnTimeout (node:internal/timers:596:11)
2024-10-16T04:29:01.190498985Z at process.processTimers (node:internal/timers:529:7) {
2024-10-16T04:29:01.190502665Z code: 'ETIMEDOUT',
2024-10-16T04:29:01.190506065Z [errors]: [
2024-10-16T04:29:01.190509545Z Error: connect ETIMEDOUT 188.114.97.3:443
2024-10-16T04:29:01.190513105Z at createConnectionError (node:net:1652:14)
2024-10-16T04:29:01.190516705Z at Timeout.internalConnectMultipleTimeout (node:net:1711:38)
2024-10-16T04:29:01.190520425Z at listOnTimeout (node:internal/timers:596:11)
2024-10-16T04:29:01.190524025Z at process.processTimers (node:internal/timers:529:7) {
2024-10-16T04:29:01.190527745Z errno: -110,
2024-10-16T04:29:01.190531145Z code: 'ETIMEDOUT',
2024-10-16T04:29:01.190534585Z syscall: 'connect',
2024-10-16T04:29:01.190538065Z address: '188.114.97.3',
2024-10-16T04:29:01.190542545Z port: 443
2024-10-16T04:29:01.190545865Z },
2024-10-16T04:29:01.190549545Z Error: connect ENETUNREACH 2a06:98c1:3121::3:443 - Local (:::0)
2024-10-16T04:29:01.190553745Z at internalConnectMultiple (node:net:1186:16)
2024-10-16T04:29:01.190558345Z at Timeout.internalConnectMultipleTimeout (node:net:1716:5)
2024-10-16T04:29:01.190580945Z at listOnTimeout (node:internal/timers:596:11)
2024-10-16T04:29:01.190585225Z at process.processTimers (node:internal/timers:529:7) {
2024-10-16T04:29:01.190589025Z errno: -101,
2024-10-16T04:29:01.190594705Z code: 'ENETUNREACH',
2024-10-16T04:29:01.190598105Z syscall: 'connect',
2024-10-16T04:29:01.190601545Z address: '2a06:98c1:3121::3',
2024-10-16T04:29:01.190605065Z port: 443
2024-10-16T04:29:01.190608985Z },
2024-10-16T04:29:01.190612345Z Error: connect ETIMEDOUT 188.114.96.3:443
2024-10-16T04:29:01.190616065Z at createConnectionError (node:net:1652:14)
2024-10-16T04:29:01.190619665Z at Timeout.internalConnectMultipleTimeout (node:net:1711:38)
2024-10-16T04:29:01.190623585Z at listOnTimeout (node:internal/timers:596:11)
2024-10-16T04:29:01.190627105Z at process.processTimers (node:internal/timers:529:7) {
2024-10-16T04:29:01.190630745Z errno: -110,
2024-10-16T04:29:01.190634105Z code: 'ETIMEDOUT',
2024-10-16T04:29:01.190637505Z syscall: 'connect',
2024-10-16T04:29:01.190640905Z address: '188.114.96.3',
2024-10-16T04:29:01.190644305Z port: 443
2024-10-16T04:29:01.190647745Z },
2024-10-16T04:29:01.190651065Z Error: connect ENETUNREACH 2a06:98c1:3120::3:443 - Local (:::0)
2024-10-16T04:29:01.190655825Z at internalConnectMultiple (node:net:1186:16)
2024-10-16T04:29:01.190659665Z at Timeout.internalConnectMultipleTimeout (node:net:1716:5)
2024-10-16T04:29:01.190663545Z at listOnTimeout (node:internal/timers:596:11)
2024-10-16T04:29:01.190667145Z at process.processTimers (node:internal/timers:529:7) {
2024-10-16T04:29:01.190670745Z errno: -101,
2024-10-16T04:29:01.190674145Z code: 'ENETUNREACH',
2024-10-16T04:29:01.190677785Z syscall: 'connect',
2024-10-16T04:29:01.190681225Z address: '2a06:98c1:3120::3',
2024-10-16T04:29:01.190684665Z port: 443
2024-10-16T04:29:01.190687986Z }
2024-10-16T04:29:01.190691346Z ]
2024-10-16T04:29:01.190695146Z }
2024-10-16T04:29:01.190698506Z }
2024-10-16T04:29:01.190701826Z
2024-10-16T04:29:01.190705146Z Node.js v22.9.0
If I restart container, it starts working again.
The text was updated successfully, but these errors were encountered: