Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignition /config/master tls: failed to verify certificate x509 #8475

Open
UriZafrir opened this issue May 25, 2024 · 3 comments
Open

ignition /config/master tls: failed to verify certificate x509 #8475

UriZafrir opened this issue May 25, 2024 · 3 comments

Comments

@UriZafrir
Copy link

UriZafrir commented May 25, 2024

Version

$ openshift-install version
./openshift-install 4.15.14
built from commit 147d2421af88084cbfbe287140e63949830e5593
release image registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0
release architecture amd64

Platform:

Please specify the platform type: aws, libvirt, openstack or baremetal

baremetal

Please specify:

  • IPI (automated install with openshift-install. If you don't know, then it's IPI)
  • UPI (semi-manual installation on customised infrastructure)

IPI

What happened?

during openshift install on vsphere the master nodes all get
ignition /config/master tls failed to verify certificate x509
and install fails.

image

Enter text here.
See the troubleshooting documentation for ideas about what information to collect.
For example, if the installer fails to create resources, attach the relevant portions of your .openshift_install.log.

oc --kubeconfig=auth/kubeconfig get clusterversion -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2024-05-25T16:07:28Z"
    generation: 1
    name: version
    resourceVersion: "6081"
    uid: 21a86f67-b3c0-4493-8e27-98748b363898
  spec:
    channel: stable-4.15
    clusterID: 3467d915-ff7b-47f2-bbe8-cb11f6f9e29f
    overrides:
    - group: ""
      kind: ConfigMap
      name: cloud-provider-config
      namespace: openshift-config
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: cluster-config-v1
      namespace: kube-system
      unmanaged: true
    - group: config.openshift.io
      kind: DNS
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Infrastructure
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Ingress
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Network
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Proxy
      name: cluster
      namespace: ""
      unmanaged: true
    - group: config.openshift.io
      kind: Scheduler
      name: cluster
      namespace: ""
      unmanaged: true
    - group: operator.openshift.io
      kind: ImageContentSourcePolicy
      name: image-policy
      namespace: ""
      unmanaged: true
    - group: ""
      kind: Secret
      name: kube-cloud-cfg
      namespace: kube-system
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: root-ca
      namespace: kube-system
      unmanaged: true
    - group: ""
      kind: Secret
      name: machine-config-server-tls
      namespace: openshift-machine-config-operator
      unmanaged: true
    - group: ""
      kind: Secret
      name: pull-secret
      namespace: openshift-config
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: user-ca-bundle
      namespace: openshift-config
      unmanaged: true
    - group: ""
      kind: Secret
      name: vsphere-creds
      namespace: kube-system
      unmanaged: true
    - group: config.openshift.io
      kind: FeatureGate
      name: cluster
      namespace: ""
      unmanaged: true
    - group: ""
      kind: Secret
      name: kubeadmin
      namespace: kube-system
      unmanaged: true
    - group: rbac.authorization.k8s.io
      kind: Role
      name: vsphere-creds-secret-reader
      namespace: kube-system
      unmanaged: true
    - group: ""
      kind: ConfigMap
      name: openshift-install-manifests
      namespace: openshift-config
      unmanaged: true
  status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - Build
      - CSISnapshot
      - CloudCredential
      - Console
      - DeploymentConfig
      - ImageRegistry
      - Insights
      - MachineAPI
      - NodeTuning
      - OperatorLifecycleManager
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
      knownCapabilities:
      - Build
      - CSISnapshot
      - CloudCredential
      - Console
      - DeploymentConfig
      - ImageRegistry
      - Insights
      - MachineAPI
      - NodeTuning
      - OperatorLifecycleManager
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
    conditions:
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: 'Unable to retrieve available updates: currently reconciling cluster
        version 4.15.14 not found in the "stable-4.15" channel'
      reason: VersionNotFound
      status: "False"
      type: RetrievedUpdates
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: Disabling ownership via cluster version overrides prevents upgrades.
        Please remove overrides before continuing.
      reason: ClusterVersionOverridesSet
      status: "False"
      type: Upgradeable
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: Capabilities match configured spec
      reason: AsExpected
      status: "False"
      type: ImplicitlyEnabledCapabilities
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: Payload loaded version="4.15.14" image="registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0"
        architecture="amd64"
      reason: PayloadLoaded
      status: "True"
      type: ReleaseAccepted
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2024-05-25T16:36:10Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operators authentication, baremetal, cloud-controller-manager, cluster-autoscaler, config-operator, control-plane-machine-set, csi-snapshot-controller, dns, etcd, image-registry, ingress, insights, kube-apiserver, kube-controller-manager, kube-scheduler, kube-storage-version-migrator, machine-api, machine-approver, machine-config, marketplace, monitoring, network, node-tuning, openshift-apiserver, openshift-controller-manager, service-ca, storage are not available
        * Could not update imagestream "openshift/driver-toolkit" (607 of 873): resource may have been deleted
        * Could not update oauthclient "console" (546 of 873): the server does not recognize this resource, check extension API servers
        * Could not update role "openshift-apiserver/prometheus-k8s" (857 of 873): resource may have been deleted
        * Could not update role "openshift-authentication/prometheus-k8s" (753 of 873): resource may have been deleted
        * Could not update role "openshift-console-operator/prometheus-k8s" (791 of 873): resource may have been deleted
        * Could not update role "openshift-console/prometheus-k8s" (795 of 873): resource may have been deleted
        * Could not update role "openshift-controller-manager/prometheus-k8s" (865 of 873): resource may have been deleted
        * Could not update role "openshift/copied-csv-viewer" (675 of 873): resource may have been deleted
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (484 of 873): resource may have been deleted
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2024-05-25T16:07:30Z"
      message: 'Unable to apply 4.15.14: an unknown error has occurred: MultipleErrors'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    desired:
      image: registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0
      url: https://access.redhat.com/errata/RHSA-2024:2865
      version: 4.15.14
    history:
    - completionTime: null
      image: registry.local:5000/ocp4@sha256:234ccdfa4adabcfa7490785bad7108a3c7d622f19cd5b8f4b241dfba96c09be0
      startedTime: "2024-05-25T16:07:30Z"
      state: Partial
      verified: false
      version: 4.15.14
    observedGeneration: 1
    versionHash: PE2-EaXpK0k=
kind: List
metadata:
  resourceVersion: ""

 oc --kubeconfig=auth/kubeconfig get clusteroperator
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication
baremetal
cloud-controller-manager
cloud-credential                                     True        False         False      35m
cluster-autoscaler
config-operator
console
control-plane-machine-set
csi-snapshot-controller
dns
etcd
image-registry
ingress
insights
kube-apiserver
kube-controller-manager
kube-scheduler
kube-storage-version-migrator
machine-api
machine-approver
machine-config
marketplace
monitoring
network
node-tuning
openshift-apiserver
openshift-controller-manager
openshift-samples
operator-lifecycle-manager
operator-lifecycle-manager-catalog
operator-lifecycle-manager-packageserver
service-ca
storage


What you expected to happen?

installer to succeed

How to reproduce it (as minimally and precisely as possible)?

$ ./openshift-install create cluster

Anything else we need to know?

References

this is the closest i got to a reference
https://access.redhat.com/solutions/4271572

@patrickdillon
Copy link
Contributor

Ignition certificates are only valid for a short period. IIRC 24 hours. A common cause of this error is if ignition configs are generated well in advance of the install.

For further debugging we would need to inspect the certs. It would be good for us to update the troubleshooting docs on how to do this.

@UriZafrir
Copy link
Author

Hi
I didn't make the ignition config in advance.
How can I debug the certificates?

@dmc5179
Copy link
Contributor

dmc5179 commented Aug 13, 2024

Another common issue is that the install directory is reused:

openshift-install create cluster --dir=/tmp/cluster
# cluster install fails
rm -f /tmp/cluster/*
openshift-install create cluster --dir=/tmp/cluster

There are hidden files in the /tmp/cluster directory that impact subsequent install attempts. Make sure to do the following between install attemps

rm -rf /tmp/cluster
mkdir /tmp/cluster

The dirty install directory with hidden files will cause issues with the cluster certificates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants