Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8S Operator: Possible permissions issue? UI cannot reach system under monitoring #2144

Open
SeanKilleen opened this issue Jan 17, 2024 · 3 comments · May be fixed by #2145
Open

K8S Operator: Possible permissions issue? UI cannot reach system under monitoring #2144

SeanKilleen opened this issue Jan 17, 2024 · 3 comments · May be fixed by #2145

Comments

@SeanKilleen
Copy link

First off -- @sungam3r thank you for all the work you've put in and continue to put in on maintaining this. I know you're spread thin these days and I don't have enough familiarity to dig in yet as a contributor but I'm keeping it in mind now that I'm becoming an active user. I'll continue to look into this issue actively.


I followed the directions to setup the K8S operator and UI using the HealthCheck CRD. The operator is deployed in cluster mode.

✅ I see in the operator's logs that the system under monitoring is discovered, using its cluster IP. The The annotations for port and endpoint are correctly discovered (:8080/healthz). The endpoint for monitoring is pushed correctly to the UI.

Logs from the operator showing success:

 [18:08:07 INF] [PushService] Namespace observability - Sending Type: Added - Service [RedactedAppName] with uri : http://[RedactedClusterIP]:8080/healthz to ui endpoint: http://100.106.236.107:80
[18:08:07 INF] Start processing HTTP request POST http://100.106.236.107/healthchecks/push?key=8709dabc-ca13-4a61-9367-6b0f0b8958b3
[18:08:07 INF] Sending HTTP request POST http://100.106.236.107/healthchecks/push?key=8709dabc-ca13-4a61-9367-6b0f0b8958b3
[18:08:07 INF] Received HTTP response headers after 4.2877ms - 200
[18:08:07 INF] End processing HTTP request after 4.5692ms - 200
[18:08:07 INF] [PushService] Notification result for [RedactedAppName] - status code: OK

✅ When port forwarding the system under monitoring to my local machine, I can get to the health checks via :8080/healthz. So I know they're accessible from the app at that URL.
❌ Despite this, the health check UI fails to retrieve the health check:

 GetHealthReport threw an exception when trying to get report from http://[RedactedClusterIP]:8080/healthz configured with name [RedactedAppName].
System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.

I'm thinking there may be an issue with the service account's permissions when operating in cluster mode. I'll post my kubernetes definitions shortly and will double-check that they match against the docs (I'm using Terraform so will make sure nothing got lost in translation there).

Environment:

  • .NET Core version: 8.x
  • Healthchecks version: latest (8)
  • Operative system: Kubernetes deployment (system under monitoring is in Linux containers)
@SeanKilleen SeanKilleen changed the title K8S Operator: Possible permissions issue? UI cannot reach K8S Operator: Possible permissions issue? UI cannot reach system under monitoring Jan 17, 2024
@SeanKilleen
Copy link
Author

SeanKilleen commented Jan 17, 2024

Namespace definition -- appears to match the definition with the exception of the name (it previously existed):

Expand YAML
kind: Namespace
apiVersion: v1
metadata:
  name: observability
  uid: e71dfd05-928d-45b9-9a50-1beaaad0b4ef
  resourceVersion: '71973636'
  creationTimestamp: '2023-10-19T23:46:54Z'
  labels:
    app.kubernetes.io/part-of: healthchecks-operator
    kubernetes.io/metadata.name: observability
spec:
  finalizers:
    - kubernetes
status:
  phase: Active

Service account: matches definition, with the exception of namespace name:

Expand YAML
kind: ServiceAccount
apiVersion: v1
metadata:
  name: healthchecks-admin
  namespace: observability
  uid: 61d4339f-259f-40b3-93df-ff32ab1f88f0
  resourceVersion: '71811869'
  creationTimestamp: '2024-01-17T14:03:49Z'
automountServiceAccountToken: true

Cluster Role -- matches definition:

Expand YAML
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: healthchecks-admin
  uid: dd38820c-1fd1-45ce-8360-bbfd6adb5497
  resourceVersion: '71811870'
  creationTimestamp: '2024-01-17T14:03:49Z'
rules:
  - verbs:
      - '*'
    apiGroups:
      - ''
    resources:
      - services
      - pods
      - deployments
      - secrets
      - configmaps
  - verbs:
      - '*'
    apiGroups:
      - apps
    resources:
      - deployments
  - verbs:
      - '*'
    apiGroups:
      - aspnetcore.ui
    resources:
      - '*'

Same for the Cluster Role Binding (definition):

Expand YAML
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: healthchecks-admin
  uid: 1b4721ac-f995-4ce6-bb21-06c279249393
  resourceVersion: '71813591'
  creationTimestamp: '2024-01-17T14:06:49Z'
subjects:
  - kind: ServiceAccount
    name: healthchecks-admin
    namespace: observability
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: healthchecks-admin

Same with operator deployment (reference) (I omitted managedFields and status for brevity)

Expand YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: healthchecks-ui-k8s-operator
  namespace: observability
  uid: 7e96d4ee-d6c2-4a1e-b867-20b440c27d3b
  resourceVersion: '71974348'
  generation: 1
  creationTimestamp: '2024-01-17T14:17:34Z'
  annotations:
    deployment.kubernetes.io/revision: '1'
spec:
  replicas: 1
  selector:
    matchLabels:
      app: healthchecks-ui-k8s-operator
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: healthchecks-ui-k8s-operator
    spec:
      containers:
        - name: healthchecks-ui-k8s-operator
          image: xabarilcoding/healthchecksui-k8s-operator:latest
          resources:
            limits:
              cpu: 500m
              memory: 300Mi
            requests:
              cpu: 300m
              memory: 100Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: Always
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: healthchecks-admin
      serviceAccount: healthchecks-admin
      automountServiceAccountToken: true
      shareProcessNamespace: false
      securityContext: {}
      schedulerName: default-scheduler
      enableServiceLinks: true
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600

I'll check the CRD resource deployment and send another update.

@SeanKilleen
Copy link
Author

SeanKilleen commented Jan 17, 2024

And here's the HealthCheck resource for, which appears to match the docs in spirit:

apiVersion: aspnetcore.ui/v1
kind: HealthCheck
metadata:
  creationTimestamp: '2024-01-17T14:55:22Z'
  generation: 1
  name: healthchecks-ui
  namespace: observability
  resourceVersion: '71841262'
  uid: d67be564-06df-4e5e-8e17-ee6f68e78489
spec:
  name: healthchecks-ui
  scope: Cluster
  serviceType: ClusterIP
  servicesLabel: HealthChecks
  stylesheetContent: "        :root {    \r\n        --primaryColor: #2a3950;\r\n        --secondaryColor: #f4f4f4;  \r\n        --bgMenuActive: #e1b015;\r\n        --bgButton: #e1b015;\r\n        --logoImageUrl: url('https://upload.wikimedia.org/wikipedia/commons/thumb/e/eb/WoW_icon.svg/1200px-WoW_icon.svg.png');\r\n        --bgAside: var(--primaryColor);   \r\n      }\r\n"

A noticeable difference is that I'm specifying ClusterIP rather than LoadBalancer, but I'd be surprised if this was the issue.

@SeanKilleen
Copy link
Author

SeanKilleen commented Jan 17, 2024

My hunch at this point is that the issue is here: https://github.com/Xabaril/AspNetCore.Diagnostics.HealthChecks/blob/master/src/HealthChecks.UI.K8s.Operator/Operator/KubernetesAddressFactory.cs#L11C1-L11C46

The CreateAddress() function refers to service.Spec.ClusterIP for the address. I'm relatively newer to this, but my understanding is that for cross-namespace interaction, you'd really want something that's DNS-compatible, e.g. $"{service.Metadata.Name}.{service.Metadata.Namespace()}.svc.cluster.local".

I'd be happy to submit a PR for this if you agree. I'm brand new to contributing but would be happy to work through the process of testing it etc.

@SeanKilleen SeanKilleen linked a pull request Jan 17, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant