TLS certificate renewal (handled by certmanager) causes pxc-db restart

Description:

I have a pxc db cluster with only 1 db instance, managed by pxc-opeartor and TLS set by cert-manager. Every time that the certificate is going to be renewed, the pxc pod is deleted and restarted again, is this the expected behaviour? From the documentation I’m expecting not having downtime when certmanager is taking care of the certificates. The downtime is intended because I’m not using a proper cluster with 3 instances?

Steps to Reproduce:

To force the regeneration of the certificate I delete the pxc-db-ssl and pxc-db-ssl-internal secrets. Certmanager creates immediately the new certificate.

Version:

pxc-operator 1.12.0
pxc-db: 1.14.0

Logs:

pxc-operator:
{“level”:“error”,“ts”:1730707547.317082,“logger”:“perconaxtradbcluster”,“caller”:“pxc/controller.go:1220”,“msg”:“sync users”,“cluster”:“tenant1-pxc-db”,“namespace”:“tenant1”,“error”:“exec syncusers: command terminated with exit code 1 / \nSyncing user accounts from PXC(tenant1-pxc-db-pxc-0.tenant1-pxc-db-pxc.tenant1.svc.cluster.local:3306) to ProxySQL\nAdding user to ProxySQL: clustercheck\n / ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading initial communication packet’, system error: 0\nERROR (line:718) : Failed to load user list from ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR (line:1553) : Failed to add the user (clustercheck) from PXC to ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\n”,“errorVerbose”:“exec syncusers: command terminated with exit code 1 / \nSyncing user accounts from PXC(tenant1-pxc-db-pxc-0.tenant1-pxc-db-pxc.tenant1.svc.cluster.local:3306) to ProxySQL\nAdding user to ProxySQL: clustercheck\n / ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading initial communication packet’, system error: 0\nERROR (line:718) : Failed to load user list from ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR 2003 (HY000): Can’t connect to MySQL server on ‘localhost:6032’ (111)\nERROR (line:1553) : Failed to add the user (clustercheck) from PXC to ProxySQL database. \n– Please check the ProxySQL connection parameters and status.\n\ngithub.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).syncPXCUsersWithProxySQL\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/users.go:969\ngithub.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).resyncPXCUsersWithProxySQL.func1\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:1218\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594”,“stacktrace”:“github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).resyncPXCUsersWithProxySQL.func1\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:1220”}

Expected Result:

I’d like to reload the certificate without downtime

Actual Result:

pxc-db pod is deleted and recreated

Additional Information:

[Include any additional information that could be helpful to diagnose the issue, such as browser or device information]

Hi @Alessandro_Rovetto. Yes, this is expected behavior. If Operator detects TLS secrets are changed, it restarts pods for PXC to use new certificates.

Thanks @ege_gunes for your answer. I wonder if there is a reload command available on mysql to avoid restart and minimize the downtime.
Moreover, is there a parameter in the pxc-db configuration to define the certificate duration value managed by certmanager? The default certmanager duration is 90d that is too short for my needs if it causes a downtime.

Thanks
Alessandro

@Alessandro_Rovetto,
Try using ALTER INSTANCE RELOAD TLS. I cannot tell from the documentation if this re-reads the certs from disk or not.

@matthewb after TLS certs are updated, mysqld is restarted. Is it really required to run this command?

@Alessandro_Rovetto unfortunately there’s no way to change cert duration through helm right now. I’ll create a jira ticket to add it in future releases.

May I ask, why do you run a single PXC instance if you don’t want to have downtime? If you run at least 3 PXC pods, they’ll be restarted in a rolling fashion.

You can track the status via this link: [K8SPXC-1494] - Jira

@Ege_Gunes, if you can run the ‘ALTER INSTANCE …’ command to reload the certs instead of restarting MySQL, would that then be a better alternative? I don’t know if the ALTER will re-read the entire certs, or just refresh some internal TLS mechanism. I don’t have a system to test that on.

@matthewb, you’re right, it’d be way better than restarting mysqld. I created K8SPXC-1497 to investigate and implement if possible.

Thanks all, the reload would be really appreciated!