Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness and readiness probes prevent the pod from starting #104

Open
kolesnikovae opened this issue Feb 8, 2023 · 0 comments
Open

Liveness and readiness probes prevent the pod from starting #104

kolesnikovae opened this issue Feb 8, 2023 · 0 comments

Comments

@kolesnikovae
Copy link
Member

kolesnikovae commented Feb 8, 2023

Currently, liveness and readiness probes are configured with initialDelaySeconds set to 30s which is fairly high value. However, in case of the container crash, Pyroscope server may need even longer time to recover the storage (it is hard to estimate the procedure duration, but a minute or two is what we may expect).

A proper solution would be to separate implementations of the readiness and liveness checks:

  • liveness probe starts serving requests in the very beginning of the server initialisation (before any other component)
  • readiness probe starts serving requests only when all the components finished the initialisation

Increasing initialDelaySeconds further by default for readiness probe might be undesirable because it will introduce noticeable unconditional delay between the server start and the moment when it actually starts serving requests.

As a workaround, I think we may adjust the default settings so that the pod has at least 90s to finish initialisation, but does not prevent server from handling requests if it managed to complete initialisation sooner:

readinessProbe:
  enabled: true
  httpGet:
    path: /healthz
    port: 4040
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 30
  failureThreshold: 10
  successThreshold: 1

# Despite the fact that the initial delay is 60 seconds, if the pod crashes
# after initialisation (this is the only realistic reason why the probe may fail),
# it will be restarted.
#
# Note that livenessProbe does not wait for readinessProbe to succeed. 
livenessProbe:
  enabled: true
  httpGet:
    path: /healthz
    port: 4040
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 30
  failureThreshold: 3
  successThreshold: 1

The current config:

readinessProbe:
  # -- Enable Pyroscope server readiness
  enabled: true
  httpGet:
    # -- Pyroscope server readiness check path
    path: /healthz
    # -- Pyroscope server readiness check port
    port: 4040
  # -- Pyroscope server readiness initial delay in seconds
  initialDelaySeconds: 30
  # -- Pyroscope server readiness check frequency in seconds
  periodSeconds: 5
  # -- Pyroscope server readiness check request timeout
  timeoutSeconds: 30
  # -- Pyroscope server readiness check failure threshold count
  failureThreshold: 3
  # -- Pyroscope server readiness check success threshold count
  successThreshold: 1

livenessProbe:
  # -- Enable Pyroscope server liveness
  enabled: true
  httpGet:
    # -- Pyroscope server liveness check path
    path: /healthz
    # -- Pyroscope server liveness check port
    port: 4040
  # -- Pyroscope server liveness check intial delay in seconds
  initialDelaySeconds: 30
  # -- Pyroscope server liveness check frequency in seconds
  periodSeconds: 15
  # -- Pyroscope server liveness check request timeout
  timeoutSeconds: 30
  # -- Pyroscope server liveness check failure threshold
  failureThreshold: 3
  # -- Pyroscope server liveness check success threshold
  successThreshold: 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant