PXC Operator logs started throwing TLS verification error consistently when a PXC based CR is deployed.
x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"Root CA\")
This started happening after operator pod was restarted as part of cluster maintenance
{"level":"error","ts":1670654625.3294516,"caller":"pxc/controller.go:267","msg":"Update status","error":"write status: Internal error occurred: failed calling webhook \"validationwebhook.pxc.percona.com\": Post \"https://percona-xtradb-cluster-operator.pxc.svc:443/validate-percona-xtradbcluster?timeout=10s\": x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"Root CA\")","errorVerbose":"Internal error occurred: failed calling webhook \"validationwebhook.pxc.percona.com\": Post \"https://percona-xtradb-cluster-operator.pxc.svc:443/validate-percona-xtradbcluster?timeout=10s\": x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"Root CA\")
write status
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).writeStatus
/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/status.go:141
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).updateStatus
/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/status.go:124
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).Reconcile.func1
/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:265
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).Reconcile
/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:483
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99
We noticed that the ValidatingWebhookConfiguration percona-xtradbcluster-webhook was not updated after the operator pod restart. We were expecting it to be updated as the CA signing the TLS certs would have regenerated after operator restart and caBundle field in the webhook to be updated.
We deleted the operator deployment, percona-xtradbcluster-webhook
Redeployed operator deployment and it created percona-xtradbcluster-webhook with a new caBundle value
The issue is even after re-deploy the operator logs still show the tls: bad certificate errors.
OpenSSL verify still fails
openssl verify -verbose -CAfile webhook_ca_bundle.crt operator_pod_tls.crt
error operator_pod_tls.crt: verification failed
O = PXC
error 7 at 0 depth lookup: certificate signature failure
2508:error:0407008A:rsa routines:RSA_padding_check_PKCS1_type_1:invalid padding:../openssl-1.1.1q/crypto/rsa/rsa_pk1.c:66:
2508:error:04067072:rsa routines:rsa_ossl_public_decrypt:padding check failed:../openssl-1.1.1q/crypto/rsa/rsa_ossl.c:588:
2508:error:0D0C5006:asn1 encoding routines:ASN1_item_verify:EVP lib:../openssl-1.1.1q/crypto/asn1/a_verify.c:170: