PXC can't upgrade from 8.0.31-23.2 to 8.0.32-24.2

Helm Chart versions:
pxc-operator:1.13.0
pxc-db:1.13.0
Distro: RKE2, Ubuntu 22 LTS

Automatic upgrades for my PXC cluster isn’t working.
I’ve also tested creating a new, separate PXC cluster install (via Helm) and it shows similar errors:

PXC pod logs: (truncated because it’s long)

 trap exit SIGTERM
+ '[' m = - ']'
+ CFG=/etc/mysql/node.cnf
+ wantHelp=
+ for arg in "$@"
+ case "$arg" in
++ mysqld -V
++ awk '{print $3}'
++ awk -F. '{print $1"."$2}'
+ MYSQL_VERSION=8.0
++ mysqld -V
++ awk '{print $3}'
++ awk -F. '{print $3}'
++ awk -F- '{print $1}'
+ MYSQL_PATCH_VERSION=32
+ vault_secret=/etc/mysql/vault-keyring-secret/keyring_vault.conf
+ '[' -f /etc/mysql/vault-keyring-secret/keyring_vault.conf ']'
+ '[' -f /usr/lib64/mysql/plugin/binlog_utils_udf.so ']'
+ sed -i '/\[mysqld\]/a plugin_load="binlog_utils_udf=binlog_utils_udf.so"' /etc/mysql/node.cnf
+ sed -i '/\[mysqld\]/a gtid-mode=ON' /etc/mysql/node.cnf
+ sed -i '/\[mysqld\]/a enforce-gtid-consistency' /etc/mysql/node.cnf
+ grep -q '^progress=' /etc/mysql/node.cnf
+ sed -i 's|^progress=.*|progress=1|' /etc/mysql/node.cnf
+ grep -q '^\[sst\]' /etc/mysql/node.cnf
+ grep -q '^cpat=' /etc/mysql/node.cnf
+ sed '/^\[sst\]/a cpat=.*\\.pem$\\|.*init\\.ok$\\|.*galera\\.cache$\\|.*wsrep_recovery_verbose\\.log$\\|.*readiness-check\\.sh$\\|.*liveness-check\\.sh$\\|.*get-pxc-state$\\|.*sst_in_progress$\\|.*pmm-prerun\\.sh$\\|.*sst-xb-tmpdir$\\|.*\\.sst$\\|.*gvwstate\\.dat$\\|.*grastate\\.dat$\\|.*\\.err$\\|.*\\.log$\\|.*RPM_UPGRADE_MARKER$\\|.*RPM_UPGRADE_HISTORY$\\|.*pxc-entrypoint\\.sh$\\|.*unsafe-bootstrap\\.sh$\\|.*pxc-configure-pxc\\.sh\\|.*peer-list$\\|.*auth_plugin$' /etc/mysql/node.cnf
+ [[ 8.0 == \8\.\0 ]]
+ [[ 32 -ge 26 ]]
+ grep -q '^skip_replica_start=ON' /etc/mysql/node.cnf
+ sed -i '/\[mysqld\]/a skip_replica_start=ON' /etc/mysql/node.cnf
+ auth_plugin=mysql_native_password
+ [[ -f /var/lib/mysql/auth_plugin ]]
++ cat /var/lib/mysql/auth_plugin
+ prev_auth_plugin=mysql_native_password
+ [[ mysql_native_password != \m\y\s\q\l\_\n\a\t\i\v\e\_\p\a\s\s\w\o\r\d ]]
+ [[ -z mysql_native_password ]]
+ [[ 8.0 == \5\.\7 ]]
+ echo mysql_native_password
+ sed -i /default_authentication_plugin/d /etc/mysql/node.cnf
+ [[ 8.0 == \8\.\0 ]]
+ [[ 32 -ge 27 ]]
+ sed -i '/\[mysqld\]/a authentication_policy=mysql_native_password,,' /etc/mysql/node.cnf
+ file_env XTRABACKUP_PASSWORD xtrabackup xtrabackup
Percona XtraDB Cluster: Finding peers
2023/08/04 04:22:56 Peer finder enter
2023/08/04 04:22:56 Determined Domain to be percona.svc.cluster.local
2023/08/04 04:22:56 Peer list updated
was []
now [10-42-189-67.pxc-db-pxc-unready.percona.svc.cluster.local 10-42-235-156.pxc-db-pxc-unready.percona.svc.cluster.local 10-42-42-179.pxc-db-pxc-unready.percona.svc.cluster.local]
2023/08/04 04:22:56 execing: /var/lib/mysql/pxc-configure-pxc.sh with stdin: 10-42-189-67.pxc-db-pxc-unready.percona.svc.cluster.local
10-42-235-156.pxc-db-pxc-unready.percona.svc.cluster.local
10-42-42-179.pxc-db-pxc-unready.percona.svc.cluster.local
2023/08/04 04:22:56 ++ hostname -I
++ awk ' { print $1 } '
+ NODE_IP=10.42.42.179
++ hostname -f
++ cut -d. -f2
+ CLUSTER_NAME=pxc-db-pxc
+ SERVER_NUM=2
+ SERVER_ID=18884702
++ hostname -f
+ NODE_NAME=pxc-db-pxc-2.pxc-db-pxc.percona.svc.cluster.local
+ NODE_PORT=3306
+ read -ra LINE
+ echo 'read line 10-42-189-67.pxc-db-pxc-unready.percona.svc.cluster.local'
read line 10-42-189-67.pxc-db-pxc-unready.percona.svc.cluster.local
++ getent hosts 10-42-189-67.pxc-db-pxc-unready.percona.svc.cluster.local
++ awk '{ print $1 }'
+ LINE_IP=10.42.189.67
+ '[' 10.42.189.67 '!=' 10.42.42.179 ']'
++ mysql_root_exec 10.42.189.67 'select @@hostname'
++ local server=10.42.189.67
++ local 'query=select @@hostname'
+ LINE_HOST=pxc-db-pxc-0
+ '[' -n pxc-db-pxc-0 ']'
+ PEERS=("${PEERS[@]}" $LINE_HOST)
+ PEERS_FULL=("${PEERS_FULL[@]}" "$LINE_HOST.$CLUSTER_NAME")
+ read -ra LINE
+ echo 'read line 10-42-235-156.pxc-db-pxc-unready.percona.svc.cluster.local'
read line 10-42-235-156.pxc-db-pxc-unready.percona.svc.cluster.local
++ getent hosts 10-42-235-156.pxc-db-pxc-unready.percona.svc.cluster.local
++ awk '{ print $1 }'
+ LINE_IP=10.42.235.156
+ '[' 10.42.235.156 '!=' 10.42.42.179 ']'
++ mysql_root_exec 10.42.235.156 'select @@hostname'
++ local server=10.42.235.156
++ local 'query=select @@hostname'
+ LINE_HOST=pxc-db-pxc-1
+ '[' -n pxc-db-pxc-1 ']'
+ PEERS=("${PEERS[@]}" $LINE_HOST)
+ PEERS_FULL=("${PEERS_FULL[@]}" "$LINE_HOST.$CLUSTER_NAME")
+ read -ra LINE
+ echo 'read line 10-42-42-179.pxc-db-pxc-unready.percona.svc.cluster.local'
read line 10-42-42-179.pxc-db-pxc-unready.percona.svc.cluster.local
++ getent hosts 10-42-42-179.pxc-db-pxc-unready.percona.svc.cluster.local
++ awk '{ print $1 }'
+ LINE_IP=10.42.42.179
+ '[' 10.42.42.179 '!=' 10.42.42.179 ']'
+ read -ra LINE
+ '[' 2 '!=' 0 ']'
++ printf '%s\n' pxc-db-pxc-0 pxc-db-pxc-1 pxc-db-pxc-2
++ sort --version-sort
++ uniq
++ grep -v -- '-0$'
++ sed '$d'
++ tr '\n' ,
++ sed 's/^,$//'
+ DONOR_ADDRESS=pxc-db-pxc-1,
+ '[' 2 '!=' 0 ']'
++ printf '%s\n' pxc-db-pxc-0.pxc-db-pxc pxc-db-pxc-1.pxc-db-pxc
++ sort --version-sort
++ tr '\n' ,
++ sed 's/,$//'
+ WSREP_CLUSTER_ADDRESS=pxc-db-pxc-0.pxc-db-pxc,pxc-db-pxc-1.pxc-db-pxc
+ CFG=/etc/mysql/node.cnf
++ mysqld -V
++ awk '{print $3}'
++ awk -F. '{print $1"."$2}'
+ MYSQL_VERSION=8.0
+ '[' 8.0 == 8.0 ']'
+ grep -E -q '^[#]?admin-address' /etc/mysql/node.cnf
+ sed '/^\[mysqld\]/a admin-address=\n' /etc/mysql/node.cnf
+ grep -E -q '^[#]?log_error_suppression_list' /etc/mysql/node.cnf
+ sed '/^\[mysqld\]/a log_error_suppression_list="MY-010055"\n' /etc/mysql/node.cnf
+ '[' yes == yes ']'
+ grep -E -q '^[#]?log-error' /etc/mysql/node.cnf
+ sed '/^\[mysqld\]/a log-error=/var/lib/mysql/mysqld-error.log\n' /etc/mysql/node.cnf
+ grep -E -q '^[#]?wsrep_sst_donor' /etc/mysql/node.cnf
+ sed '/^\[mysqld\]/a wsrep_sst_donor=\n' /etc/mysql/node.cnf
+ grep -E -q '^[#]?wsrep_node_incoming_address' /etc/mysql/node.cnf
+ grep -E -q '^[#]?wsrep_provider_options' /etc/mysql/node.cnf
+ sed '/^\[mysqld\]/a wsrep_provider_options="pc.weight=10"\n' /etc/mysql/node.cnf
+ sed -r 's|^[#]?server_id=.*$|server_id=18884702|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?coredumper$|coredumper|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?wsrep_node_address=.*$|wsrep_node_address=10.42.42.179|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?wsrep_cluster_name=.*$|wsrep_cluster_name=pxc-db-pxc|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?wsrep_sst_donor=.*$|wsrep_sst_donor=pxc-db-pxc-1,|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?wsrep_cluster_address=.*$|wsrep_cluster_address=gcomm://pxc-db-pxc-0.pxc-db-pxc,pxc-db-pxc-1.pxc-db-pxc|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?wsrep_node_incoming_address=.*$|wsrep_node_incoming_address=pxc-db-pxc-2.pxc-db-pxc.percona.svc.cluster.local:3306|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?admin-address=.*$|admin-address=10.42.42.179|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?extra_max_connections=.*$|extra_max_connections=100|' /etc/mysql/node.cnf
+ sed -r 's|^[#]?extra_port=.*$|extra_port=33062|' /etc/mysql/node.cnf
+ CA=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+ '[' -f /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt ']'
+ SSL_DIR=/etc/mysql/ssl
+ '[' -f /etc/mysql/ssl/ca.crt ']'
+ SSL_INTERNAL_DIR=/etc/mysql/ssl-internal
+ '[' -f /etc/mysql/ssl-internal/ca.crt ']'
+ KEY=/etc/mysql/ssl/tls.key
+ CERT=/etc/mysql/ssl/tls.crt
+ '[' -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
+ '[' -f /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -a -f /etc/mysql/ssl/tls.key -a -f /etc/mysql/ssl/tls.crt ']'
+ sed '/^\[mysqld\]/a pxc-encrypt-cluster-traffic=OFF' /etc/mysql/node.cnf
2023/08/04 04:22:57 Peer finder exiting
Cluster address set to: pxc-db-pxc-0.pxc-db-pxc,pxc-db-pxc-1.pxc-db-pxc
/usr/sbin/mysqld Ver 8.0.32-24.2 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel24, Revision 2119e75, WSREP version 26.1.4.3)
1c1
< /usr/sbin/mysqld Ver 8.0.32-24.2 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel24, Revision 2119e75, WSREP version 26.1.4.3)
---
> /usr/sbin/mysqld Ver 8.0.31-23.2 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel23, Revision e6e483f, WSREP version 26.1.4.3)
+ for i in {120..0}
+ echo 'SELECT 1'
+ mysql --protocol=socket -uoperator -hlocalhost --socket=/tmp/mysql.sock --password= -px
+ echo 'MySQL init process in progress...'
+ sleep 1

.....
MySQL init process in progress...
+ for i in {120..0}
+ echo 'SELECT 1'
+ mysql --protocol=socket -uoperator -hlocalhost --socket=/tmp/mysql.sock --password= -px
+ echo 'MySQL init process in progress...'
MySQL init process in progress...
+ sleep 1
+ '[' 0 = 0 ']'
+ echo 'MySQL init process failed.'
MySQL init process failed.
+ exit 1

I wasn’t expecting init (initialise from a version upgrade?).

Log pod:
pxc-db-pxc-2_logs.log (3.0 MB)

PXC update config:

updateStrategy: SmartUpdate
upgradeOptions:
  apply: recommended
  schedule: 0 4 * * *
  versionServiceEndpoint: https://check.percona.com

Previous image:
image:
repository: percona/percona-xtradb-cluster
tag: 8.0.31-23.2

Since it’s a rolling update, I decided to delete the pxc-db-2 pod along with the PVC and I don’t get MYSQL init start, but it does SST, which fails with the error:

{"log":"2023-08-04T05:29:04.053400Z 0 [Warning] [MY-000000] [Galera] 0.0 (pxc-db-pxc-1): State transfer to 2.0 (pxc-db-pxc-2) failed: -22 (Invalid argument)\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-04T05:04:41.827580Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1216: Will never receive state. Need to abort.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.827597Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.827612Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.829428Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.832845Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node\nview (view_id(NON_PRIM,5b4a3d97-b4e9,13)\nmemb {\n\t8d06c3af-89ab,0\n\t}\njoined {\n\t}\nleft {\n\t}\npartitioned {\n\t5b4a3d97-b4e9,0\n\t89ac38cd-a81a,0\n\t}\n)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.832904Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.832912Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node\nview ((empty))\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.833084Z 0 [Note] [MY-000000] [Galera] gcomm: closed\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.833105Z 0 [Note] [MY-000000] [Galera] mysqld: Terminated.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.833114Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.833121Z 0 [Note] [MY-000000] [WSREP] Terminating SST process\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41Z UTC - mysqld got signal 11 ;\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.\nBuildID[sha1]=df9f6877fc91c9a71d439f27569eabdef408f622\nServer Version: 8.0.32-24.2 Percona XtraDB Cluster (GPL), Release rel24, Revision 2119e75, WSREP version 26.1.4.3, wsrep_26.1.4.3\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"Thread pointer: 0x0\nAttempting backtrace. You can use the following information to find out\nwhere mysqld died. If you see no messages after this, something went\nterribly wrong...\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.833754Z 0 [Note] [MY-000000] [WSREP-SST] Avg:[ 358KiB/s] Elapsed:0:19:33 Bytes: 410MiB \r joiner: => Rate:[5.87KiB/s] Avg:[ 358KiB/s] Elapsed:0:19:33 Bytes: 410MiB \r joiner: => Rate:[42.1KiB/s] Avg:[ 355KiB/s] Elapsed:0:19:44 Bytes: 410MiB \r joiner: => Rate:[4.05KiB/s] Avg:[ 352KiB/s] Elapsed:0:19:53 Bytes: 410MiB \rERR:Removing /var/lib/mysql//sst-xb-tmpdir/xtrabackup_galera_info file due to signal\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.833800Z 0 [Note] [MY-000000] [WSREP-SST] \n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-04T05:04:41.835063Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:143\nstack_bottom = 0 thread_stack 0x100000\n/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x2253a31]\n/usr/sbin/mysqld(print_fatal_signal(int)+0x39f) [0x1262d0f]\n/usr/sbin/mysqld(handle_fatal_signal+0xd8) [0x1262df8]\n/lib64/libpthread.so.0(+0x12ce0) [0x7fdc6e066ce0]\n/lib64/libc.so.6(abort+0x203) [0x7fdc6c3e8ee1]\n/usr/lib64/galera4/libgalera_smm.so(+0x17935) [0x7fdc5f746935]\n/usr/lib64/galera4/libgalera_smm.so(+0x1ae20b) [0x7fdc5f8dd20b]\n/usr/lib64/galera4/libgalera_smm.so(+0x1b5f3d) [0x7fdc5f8e4f3d]\n/lib64/libpthread.so.0(+0x81cf) [0x7fdc6e05c1cf]\n/lib64/libc.so.6(clone+0x43) [0x7fdc6c400dd3]\nYou may download the Percona XtraDB Cluster operations manual by visiting\nhttp://www.percona.com/software/percona-xtradb-cluster/. You may find information\nin the manual which will help you identify the cause of the crash.\nWriting a core file using lib coredumper\nPATH: (null)\n","file":"/var/lib/mysql/mysqld-error.log"}

The only thing I can do right now is stop the operator from upgrading, change the version manually on the statefulset and clean up the pxc-db-2 PVC again.

Hi @voarsh !
This seems strange and we don’t support RKE2, but I can try to reproduce on another environment (if the issue is maybe not just related to RKE2 - really not sure right now just from the output).

The upgrade from 8.0.31 to 8.0.32 seems to work in general, but can you please share the helm command and values with which you start the cluster initially and then what do you do later to trigger the upgrade.

I believe you said you don’t support RKE1 - which 100% didn’t work. Then I moved to RKE 2 (Kubernetes 1.23) and it has been smooth sailing. But not this update. Not sure if it’s a MYSQL issue or CR/operator issue.

Will update when I have it. Not keen on triggering this again.

I might’ve completed the upgrade by doing the following:
Rolling upgrade to the new version - delete the PVC for the pod (it’s fresh).
SST from cluster,fails, runs upgrades:

{"log":"2023-08-05T15:15:08.921024Z 4 [System] [MY-013381] [Server] Server upgrade from '80031' to '80032' started.\n","file":"/var/lib/mysql/wsrep_recovery_verbose.log"}
Sat, Aug 5 2023 4:15:10 pm {"log":"2023-08-05T15:15:08.921639Z 4 [Note] [MY-013386] [Server] Running queries to upgrade MySQL server.\n","file":"/var/lib/mysql/wsrep_recovery_verbose.log"}
{"log":"2023-08-05T15:52:56.440404Z 4 [System] [MY-013381] [Server] Server upgrade from '80031' to '80032' completed.\n","file":"/var/lib/mysql/wsrep_recovery_verbose.log"}

Further SST’s fail with being unable to reach seq id from SST. :frowning_face:

{"log":"2023-08-05T16:14:34.938230Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.32-24.2) starting as process 7982\n","file":"/var/lib/mysql/mysqld.post.processing.log"}

{"log":"2023-08-05T16:14:35.942544Z 0 [Warning] [MY-010075] [Server] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 2ba07711-33ab-11ee-86b8-5207060520cf.\n","file":"/var/lib/mysql/mysqld.post.processing.log"}

{"log":"2023-08-05T16:14:36.914670Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.\n","file":"/var/lib/mysql/mysqld.post.processing.log"}

{"log":"2023-08-05T16:15:23.784620Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.\n","file":"/var/lib/mysql/mysqld.post.processing.log"}

{"log":"2023-08-05T16:15:35.313217Z 1 [Note] [MY-000000] [WSREP] wsrep_init_schema_and_SR (nil)\n","file":"/var/lib/mysql/mysqld.post.processing.log"}

{"log":"2023-08-05T16:19:37.227408Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** \n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.227476Z 0 [ERROR] [MY-000000] [WSREP-SST] Failed to start the mysql server that checks for async replication. (timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.227482Z 0 [ERROR] [MY-000000] [WSREP-SST] Check the parameters and retry\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.227486Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 550 pid:7982\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.228766Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ mysql error log (START) ------------\n\t---- Starting the MySQL server used for post-processing ----\n\t2023-08-05T16:14:34.936722Z 0 [Warning] [MY-011068] [Server] The syntax '--skip-host-cache' is deprecated and will be removed in a future release. Please use SET GLOBAL host_cache_size=0 instead.\n\t2023-08-05T16:14:34.936740Z 0 [Warning] [MY-011068] [Server] The syntax 'wsrep_slave_threads' is deprecated and will be removed in a future release. Please use wsrep_applier_threads instead.\n\t2023-08-05T16:14:34.936763Z 0 [Warning] [MY-011068] [Server] The syntax 'skip_slave_start' is deprecated and will be removed in a future release. Please use skip_replica_start instead.\n\t2023-08-05T16:14:34.937700Z 0 [Warning] [MY-010097] [Server] Insecure configuration for --secure-log-path: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path.\n\t2023-08-05T16:14:34.937727Z 0 [Warning] [MY-000000] [WSREP] Node is not a cluster node. Disabling pxc_strict_mode\n\t2023-08-05T16:14:34.938230Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.32-24.2) starting as process 7982\n\t2023-08-05T16:14:35.942544Z 0 [Warning] [MY-010075] [Server] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 2ba07711-33ab-11ee-86b8-5207060520cf.\n\t2023-08-05T16:14:36.914670Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.\n\t2023-08-05T16:15:23.784620Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.\n\t2023-08-05T16:15:35.313217Z 1 [Note] [MY-000000] [WSREP] wsrep_init_schema_and_SR (nil)\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.228792Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ mysql error log (END) ------------\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.228800Z 0 [ERROR] [MY-000000] [WSREP-SST] ****************************************************** \n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.228806Z 0 [Note] [MY-000000] [WSREP-SST] ...........post-processing failed. Exiting\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.228939Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:3\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235431Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.42.42.187' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '1' --mysqld-version '8.0.32-24.2' --binlog 'binlog' : 3 (No such process)\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235490Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235501Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 3 (No such process)\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235696Z 3 [Note] [MY-000000] [Galera] Processing SST received\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235729Z 3 [Note] [MY-000000] [Galera] SST received: 00000000-0000-0000-0000-000000000000:-1\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235753Z 3 [System] [MY-000000] [WSREP] SST completed\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235854Z 1 [Note] [MY-000000] [Galera] str_proto_ver_: 3 sst_seqno_: -1 cc_seqno: 875804 req->ist_len(): 71\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235879Z 1 [ERROR] [MY-000000] [Galera] Application received wrong state: \n\tReceived: 00000000-0000-0000-0000-000000000000\n\tRequired: 23d4ff65-1f36-11ee-891a-4fb321a34c1e\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235885Z 1 [ERROR] [MY-000000] [Galera] Application state transfer failed. This is unrecoverable condition, restart required.\n","file":"/var/lib/mysql/mysqld-error.log"}

{"log":"2023-08-05T16:19:37.235905Z 1 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()\n","file":"/var/lib/mysql/mysqld-error.log"}

TL;DR:
Fresh PVC and SST upgrades MYSQL, but getting up-to-date fails with being unable to " read uuid:seqno from joiner script", and it’s not a one-time error.

:person_shrugging:

Here’s my Helm Chart values in its entirety:
Changing the PXC version by applying the values startings the rolling upgrade (pxc-2)

allowUnsafeConfigurations: false
backup:
  enabled: false
  image:
    repository: percona/percona-xtradb-cluster-operator
    tag: 1.13.0-pxc8.0-backup-pxb8.0.32
  imagePullSecrets: []
  pitr:
    enabled: false
    resources:
      limits: {}
      requests: {}
    storageName: s3-us-west
    timeBetweenUploads: 60
  schedule: []
  storages:
    fs-pvc:
      volume:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 100Gi
    s3-us-west:
      s3:
        bucket: percona
        credentialsSecret: my-cluster-name-backup-s3
        endpointUrl: http://192.168.100.103:30719
        region: us-west-2
      type: s3
      verifyTLS: false
crVersion: 1.13.0
enableCRValidationWebhook: false
finalizers:
  - delete-pxc-pods-in-order
fullnameOverride: ''
haproxy:
  affinity:
    antiAffinityTopologyKey: kubernetes.io/hostname
  annotations: {}
  enabled: false
  gracePeriod: 30
  image: ''
  imagePullSecrets: []
  labels: {}
  livenessDelaySec: 300
  livenessProbes:
    failureThreshold: 4
    initialDelaySeconds: 60
    periodSeconds: 30
    successThreshold: 1
    timeoutSeconds: 5
  nodeSelector: {}
  podDisruptionBudget:
    maxUnavailable: 1
  readinessDelaySec: 15
  readinessProbes:
    failureThreshold: 3
    initialDelaySeconds: 15
    periodSeconds: 5
    successThreshold: 1
    timeoutSeconds: 1
  replicasServiceEnabled: true
  resources:
    limits: {}
    requests:
      cpu: 600m
      memory: 1G
  sidecarPVCs: []
  sidecarResources:
    limits: {}
    requests: {}
  sidecarVolumes: []
  sidecars: []
  size: 3
  tolerations: []
ignoreAnnotations: []
ignoreLabels: []
initImage: ''
logcollector:
  enabled: true
  image: ''
  imagePullSecrets: []
  resources:
    limits: {}
    requests:
      cpu: 200m
      memory: 100M
nameOverride: ''
operatorImageRepository: percona/percona-xtradb-cluster-operator
pause: false
pmm:
  enabled: true
  image:
    repository: percona/pmm-client
    tag: 2.38.0
  imagePullSecrets: []
  resources:
    limits: {}
    requests:
      cpu: 300m
      memory: 150M
  serverHost: monitoring-service
  serverUser: admin
proxysql:
  affinity:
    antiAffinityTopologyKey: kubernetes.io/hostname
  annotations: {}
  enabled: true
  gracePeriod: 60
  image: ''
  imagePullSecrets: []
  labels: {}
  livenessDelaySec: 300
  nodeSelector: {}
  persistence:
    accessMode: ReadWriteOnce
    enabled: true
    size: 8Gi
  podDisruptionBudget:
    maxUnavailable: 1
  readinessDelaySec: 15
  resources:
    limits:
      cpu: 1000m
      memory: 1G
    requests:
      cpu: 300m
      memory: 600Mi
  sidecarPVCs: []
  sidecarResources:
    limits: {}
    requests: {}
  sidecarVolumes: []
  sidecars: []
  size: 3
  tolerations: []
pxc:
  affinity:
    antiAffinityTopologyKey: kubernetes.io/hostname
  annotations: {}
  autoRecovery: true
  certManager: false
  disableTLS: true
  gracePeriod: 600
  image:
    repository: percona/percona-xtradb-cluster
    tag: 8.0.32-24.2
  imagePullSecrets: []
  labels: {}
  livenessDelaySec: 1200
  livenessProbes:
    failureThreshold: 15
    initialDelaySeconds: 1200
    periodSeconds: 15
    successThreshold: 1
    timeoutSeconds: 5
  nodeSelector: {}
  persistence:
    accessMode: ReadWriteOnce
    enabled: true
    size: 50Gi
  podDisruptionBudget:
    maxUnavailable: 1
  readinessDelaySec: 15
  readinessProbes:
    failureThreshold: 60
    initialDelaySeconds: 1900
    periodSeconds: 60
    successThreshold: 1
    timeoutSeconds: 15
  resources:
    limits:
      memory: 10G
    requests:
      cpu: 100m
      memory: 1G
  sidecarPVCs: []
  sidecarResources:
    limits: {}
    requests: {}
  sidecarVolumes: []
  sidecars: []
  size: 3
  tolerations: []
  configuration: >
    [mysqld]

    max_user_connections=1500

    wsrep_provider_options=evs.suspect_timeout=PT80S; evs.inactive_timeout=PT4M;
    evs.install_timeout=PT4M; gmcast.peer_timeout=PT11S;

    pxc-encrypt-cluster-traffic=OFF

    pxc_strict_mode=PERMISSIVE
secrets:
  tls: {}
tls: {}
updateStrategy: SmartUpdate
upgradeOptions:
  apply: disabled
  schedule: 0 4 * * *
  versionServiceEndpoint: https://check.percona.com
global:
  cattle:
    systemProjectId: p-fd24j

edit: (07/08/2023)
I did some more tests.
Strangely I can install 8.0.32-24.2 AND 8.0.33-25.1 with a completely fresh install (no existing data).
But, with any database, or even more strange, installing 8.0.32-24.2 (from fresh) and upgrading to 8.0.33-25.1 gives the same issue.

Upgrading on an empty clean install just does:

{"log":"2023-08-07T14:03:36.927618Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-07T14:03:53.517806Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-07T14:03:58.845107Z 1 [Note] [MY-000000] [WSREP] wsrep_init_schema_and_SR (nil)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-08-07T14:03:59.111358Z 4 [System] [MY-013381] [Server] Server upgrade from '80032' to '80033' started.\n","file":"/var/lib/mysql/mysqld-error.log"}

The PXC pod just cycles through:

MySQL init process in progress...
+ echo 'MySQL init process in progress...'
+ sleep 1
+ '[' 0 = 0 ']'
+ echo 'MySQL init process failed.'
MySQL init process failed.
+ exit 1

I installed a fresh 8.0.31-23.1, using Helm Chart pxc-db:1.12.0, note the crVersion is 1.12.0, then changed the PXC image to 8.0.33-25.1, did not work.

Then, I span up a test DigitalOcean Kubernetes deployment with Kubernetes 1.25, repeated the 8.0.31-23.1, to 8.0.32-24.2 and 8.0.33-25.1 upgrades and it worked…

The RKE2 1.23 cluster never reaches (like the DigitalOcean test cluster):

+ echo 'SELECT 1'
+ mysql --protocol=socket -uoperator -hlocalhost --socket=/tmp/mysql.sock --password= '-p~s4aZ#=r5Slk*#R$>)'
+ echo 'MySQL init process in progress...'
MySQL init process in progress...
+ sleep 1
+ for i in {120..0}
+ mysql --protocol=socket -uoperator -hlocalhost --socket=/tmp/mysql.sock --password= '-p~s4aZ#=r5Slk*#R$>)'
+ echo 'SELECT 1'
+ break
+ '[' 100 = 0 ']'
+ mysql_upgrade --force --protocol=socket -uoperator -hlocalhost --socket=/tmp/mysql.sock --password= '-p~s4aZ#=r5Slk*#R$>)'
mysql_upgrade: [Warning] Using a password on the command line interface can be insecure.
The mysql_upgrade client is now deprecated. The actions executed by the upgrade client are now done by the server.
To upgrade, please start the new MySQL binary with the older data directory. Repairing user tables is done automatically. Restart is not required after upgrade.
The upgrade process automatically starts on running a new MySQL binary with an older data directory. To avoid accidental upgrades, please use the --upgrade=NONE option with the MySQL binary. The option --upgrade=FORCE is also provided to run the server upgrade sequence on demand.
It may be possible that the server upgrade fails due to a number of reasons. In that case, the upgrade sequence will run again during the next MySQL server start. If the server upgrade fails repeatedly, the server can be started with the --upgrade=MINIMAL option to start the server without executing the upgrade sequence, thus allowing users to manually rectify the problem.
+ kill -s TERM 151
+ wait 151```

Just tested installing a test RKE2 1.24 Kubernetes cluster and 8.0.31-23.1, to 8.0.32-24.2 was fine. It’s the exact same distribution (both are RKE2, 1.24 clusters - I upgraded main today). I am copying the YAML’s exactly. It makes no sense, and really lost.

TL;DR
I don’t know why a fresh RKE2 1.24 Kubernetes cluster can upgrade. My main (the same version), same config, etc, does not work. :confused:
The only way I can upgrade is to DUMP all my databases, delete my PVC’s, use the newer version and import it and repeat (I assume for every upgrade that I want to do). Upgrading doesn’t work and I’ve no clue why.