TLS certificate renewal (handled by certmanager) causes pxc-db restart

Alessandro_Rovetto · November 4, 2024, 8:09am

Description:

I have a pxc db cluster with only 1 db instance, managed by pxc-opeartor and TLS set by cert-manager. Every time that the certificate is going to be renewed, the pxc pod is deleted and restarted again, is this the expected behaviour? From the documentation I’m expecting not having downtime when certmanager is taking care of the certificates. The downtime is intended because I’m not using a proper cluster with 3 instances?

Steps to Reproduce:

To force the regeneration of the certificate I delete the pxc-db-ssl and pxc-db-ssl-internal secrets. Certmanager creates immediately the new certificate.

Version:

pxc-operator 1.12.0
pxc-db: 1.14.0

Logs:

pxc-operator:
{“level”:“error”,“ts”:1730707547.317082,“logger”:“perconaxtradbcluster”,“caller”:“pxc/controller.go:1220”,“msg”:“sync users”,“cluster”:“tenant1-pxc-db”,“namespace”:“tenant1”,“error”:“exec syncusers: command terminated with exit code 1 / \nSyncing user accounts from PXC(tenant1-pxc-db-pxc-0.tenant1-pxc-db-pxc.tenant1.svc.cluster.local:3306) to ProxySQL\nAdding user to ProxySQL: clustercheck\n / ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading initial communication packet’, system error: 0\nERROR (line:718) : Failed to load user list from ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR (line:1553) : Failed to add the user (clustercheck) from PXC to ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\n”,“errorVerbose”:“exec syncusers: command terminated with exit code 1 / \nSyncing user accounts from PXC(tenant1-pxc-db-pxc-0.tenant1-pxc-db-pxc.tenant1.svc.cluster.local:3306) to ProxySQL\nAdding user to ProxySQL: clustercheck\n / ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading initial communication packet’, system error: 0\nERROR (line:718) : Failed to load user list from ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR (line:1553) : Failed to add the user (clustercheck) from PXC to ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\n\ngithub.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).syncPXCUsersWithProxySQL\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/users.go:969\ngithub.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).resyncPXCUsersWithProxySQL.func1\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:1218\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594”,“stacktrace”:“github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).resyncPXCUsersWithProxySQL.func1\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:1220”}

Expected Result:

I’d like to reload the certificate without downtime

Actual Result:

pxc-db pod is deleted and recreated

Additional Information:

[Include any additional information that could be helpful to diagnose the issue, such as browser or device information]

Ege_Gunes · November 4, 2024, 8:50am

Hi @Alessandro_Rovetto. Yes, this is expected behavior. If Operator detects TLS secrets are changed, it restarts pods for PXC to use new certificates.

Alessandro_Rovetto · November 4, 2024, 9:04am

Thanks @ege_gunes for your answer. I wonder if there is a reload command available on mysql to avoid restart and minimize the downtime.
Moreover, is there a parameter in the pxc-db configuration to define the certificate duration value managed by certmanager? The default certmanager duration is 90d that is too short for my needs if it causes a downtime.

Thanks
Alessandro

matthewb · November 4, 2024, 7:12pm

@Alessandro_Rovetto,
Try using ALTER INSTANCE RELOAD TLS. I cannot tell from the documentation if this re-reads the certs from disk or not.

Ege_Gunes · November 7, 2024, 11:36am

@matthewb after TLS certs are updated, mysqld is restarted. Is it really required to run this command?

@Alessandro_Rovetto unfortunately there’s no way to change cert duration through helm right now. I’ll create a jira ticket to add it in future releases.

May I ask, why do you run a single PXC instance if you don’t want to have downtime? If you run at least 3 PXC pods, they’ll be restarted in a rolling fashion.

Ege_Gunes · November 7, 2024, 11:45am

You can track the status via this link: [K8SPXC-1494] - Jira

matthewb · November 7, 2024, 1:32pm

@Ege_Gunes, if you can run the ‘ALTER INSTANCE …’ command to reload the certs instead of restarting MySQL, would that then be a better alternative? I don’t know if the ALTER will re-read the entire certs, or just refresh some internal TLS mechanism. I don’t have a system to test that on.

Ege_Gunes · November 8, 2024, 8:12am

@matthewb, you’re right, it’d be way better than restarting mysqld. I created K8SPXC-1497 to investigate and implement if possible.

Alessandro_Rovetto · November 8, 2024, 8:27am

Thanks all, the reload would be really appreciated!

Topic		Replies	Views
Question: Does PXC Operator renew TLS certificate that automatically generated by the operator itself? Percona Operator for MySQL percona	4	172	September 17, 2024
Operator not creating Issuer or Certificate Percona Operator for MySQL	2	440	April 5, 2024
PXC Operator v1.10.0 Webhook TLS Issue Percona XtraDB Cluster 5.x mysql , percona , closed-no-reply	0	1205	December 10, 2022
Kubernetes XtraDB Cluster Operator frequently restarts Percona Operator for MySQL	2	330	March 8, 2024
Looks Like Bug, to many connection crashes PXC Percona XtraDB Cluster 8.x percona	6	1772	October 20, 2022