Upgrading PXC 5.6->5.7 SST Transfer always resets performance_schema db

We are in the process of upgrading all our clusters from PXC 5.6 to 5.7 and it has gone well, except we’ve hit a case where one instance is very high volume and we can’t seem to get it to sync properly. Here’s the scenario:
Cluster with 3 PXC 5.6 nodes running.
We add a new db node which has had 5.7 installed.
When that new node tries to join, it initiates a SST sync which since, the data is coming from 5.6, appears to reset the performance_schema db back to the 5.6 version and the rest of the sync is filled with errors.
We shutdown the server, run in standalone mode, run mysql_upgrade which succeeds and I’ve verified at that point that the performance_schema db is correct.
We shutdown the standalone server and restart normally again, but it does another SST sync and wipes out the performance_schema db again.

According to the upgrade docs, if it does an SST transfer, then we need to run mysql_upgrade again, but it just does the same thing again when we try to join it to the cluster. Is the problem that because of the high load of data, we can’t get it back up in time to be able to do just an IST transfer?

In one case, we have 4 separate instances of PXC running in the cluster. Three of them upgraded with no issues, but this one troubled instance is very high load/traffic.

We do the installations via tarball and not RPM because we are isolating our PXC installations from the system-level.

Here’s a sample from the logfile of a new db node trying to initialize from the current 5.6 joiner node.

2024-05-09T01:15:12.444016Z mysqld_safe Logging to '/apps/epic/var/log/epic-gnsnet/pxc-metadb.err'.
2024-05-09T01:15:12.473454Z mysqld_safe Starting mysqld daemon with databases from /apps/epic/var/cache/epic-gnsnet/pxc-metadb
2024-05-09T01:15:12.479220Z mysqld_safe Skipping wsrep-recover for empty datadir: /apps/epic/var/cache/epic-gnsnet/pxc-metadb
2024-05-09T01:15:12.480629Z mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
2024-05-09T01:15:12.647347Z 0 [Warning] The syntax '--log_warnings/-W' is deprecated and will be removed in a future release. Please use '--log_error_verbosity' instead.
2024-05-09T01:15:12.647469Z 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged
 with strict mode in a future release.
2024-05-09T01:15:12.652876Z 0 [Warning] WSREP: Could not open state file for reading: '/apps/epic/var/cache/epic-gnsnet/pxc-metadb//grastate.dat'
2024-05-09T01:15:12.652893Z 0 [Warning] WSREP: No persistent state found. Bootstraping with default state
2024-05-09T01:15:13.668302Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
        2024-05-09T01:15:13.930738Z WSREP_SST: [INFO] Streaming with xbstream
        2024-05-09T01:15:14.991593Z WSREP_SST: [WARNING] WARNING: PXC is receiving an SST from a node with a lower version.
        2024-05-09T01:15:14.992537Z WSREP_SST: [WARNING] This node's PXC version is 5.7. The donor's PXC version is 5.6.
        2024-05-09T01:15:14.993827Z WSREP_SST: [WARNING] Run mysql_upgrade in non-cluster (standalone mode) to upgrade.
        2024-05-09T01:15:14.994765Z WSREP_SST: [WARNING] Check the upgrade process here:
        2024-05-09T01:15:14.995718Z WSREP_SST: [WARNING]     https://www.percona.com/doc/percona-xtradb-cluster/5.7/howtos/upgrade_guide.html
        2024-05-09T01:15:15.013320Z WSREP_SST: [INFO] Streaming with xbstream
        2024-05-09T01:15:15.019742Z WSREP_SST: [INFO] Proceeding with SST.........
        2024-05-09T01:15:15.027929Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
        2024-05-09T01:15:41.892788Z WSREP_SST: [INFO] Preparing the backup at /apps/epic/var/cache/epic-gnsnet/pxc-metadb//.sst
        2024-05-09T01:16:09.000847Z WSREP_SST: [INFO] Moving the backup to /apps/epic/var/cache/epic-gnsnet/pxc-metadb/
        2024-05-09T01:16:09.150876Z WSREP_SST: [INFO] Galera co-ords from recovery: a7c17150-d8ff-11ed-b1a4-8fb18da769f9:1305833819
2024-05-09T01:16:09.534565Z 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
2024-05-09T01:16:09.534596Z 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
2024-05-09T01:16:09.534600Z 0 [ERROR] InnoDB: If you are installing InnoDB, remember that you must create directories yourself, InnoDB does not create them.
2024-05-09T01:16:09.534604Z 0 [ERROR] InnoDB: Cannot open datafile for read-only: './gnsnet_loadModule/aggregate_node.ibd' OS error: 71
2024-05-09T01:16:09.534751Z 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
2024-05-09T01:16:09.534756Z 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
2024-05-09T01:16:09.534759Z 0 [ERROR] InnoDB: If you are installing InnoDB, remember that you must create directories yourself, InnoDB does not create them.
2024-05-09T01:16:09.534763Z 0 [ERROR] InnoDB: Cannot open datafile for read-only: './gnsnet_loadModule/alert_node.ibd' OS error: 71
...
2024-05-09T01:16:09.536408Z 0 [Warning] InnoDB: Upgrading redo log: 2*262144 pages, LSN=6048566389917
2024-05-09T01:16:09.636971Z 0 [Warning] InnoDB: Starting to delete and rewrite log files.
 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 36
00 3700 3800 3900 4000
 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 36
00 3700 3800 3900 4000
2024-05-09T01:16:18.896083Z 0 [Warning] InnoDB: New log files created, LSN=6048566390293
2024-05-09T01:16:18.968994Z 0 [ERROR] InnoDB: Cannot open '/apps/epic/var/cache/epic-gnsnet/pxc-metadb/ib_buffer_pool' for reading: No such file or directory
2024-05-09T01:16:18.971916Z 0 [Warning] System table 'plugin' is expected to be transactional.
2024-05-09T01:16:18.972839Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUI
D: bd5e0e33-0da1-11ef-936c-e8ebd3835d30.
2024-05-09T01:16:18.973174Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2024-05-09T01:16:19.283429Z 0 [Warning] A deprecated TLS version TLSv1 is enabled. Please use TLSv1.2 or higher.
2024-05-09T01:16:19.283442Z 0 [Warning] A deprecated TLS version TLSv1.1 is enabled. Please use TLSv1.2 or higher.
2024-05-09T01:16:19.283796Z 0 [Warning] CA certificate ca.pem is self signed.
2024-05-09T01:16:19.311355Z 0 [Warning] Insecure configuration for --pid-file: Location '/apps/epic/var/cache/epic-gnsnet/pxc-metadb' in the path is accessible to all OS
 users. Consider choosing a different directory.
2024-05-09T01:16:19.311522Z 0 [Warning] Failed to open optimizer cost constant tables

2024-05-09T01:16:19.311957Z 0 [Warning] 'user' entry 'root@ci76a00is-qukt13100701.isg.apple.com' ignored in --skip-name-resolve mode.
2024-05-09T01:16:19.311978Z 0 [Warning] 'user' entry '@ci76a00is-qukt13100701.isg.apple.com' ignored in --skip-name-resolve mode.
2024-05-09T01:16:19.312008Z 0 [Warning] 'proxies_priv' entry '@ root@ci76a00is-qukt13100701.isg.apple.com' ignored in --skip-name-resolve mode.
2024-05-09T01:16:19.312385Z 0 [Warning] System table 'time_zone_leap_second' is expected to be transactional.
2024-05-09T01:16:19.312392Z 0 [Warning] System table 'time_zone_name' is expected to be transactional.
2024-05-09T01:16:19.312394Z 0 [Warning] System table 'time_zone' is expected to be transactional.
2024-05-09T01:16:19.312396Z 0 [Warning] System table 'time_zone_transition_type' is expected to be transactional.
2024-05-09T01:16:19.312398Z 0 [Warning] System table 'time_zone_transition' is expected to be transactional.
2024-05-09T01:16:19.312747Z 0 [Warning] System table 'servers' is expected to be transactional.
2024-05-09T01:16:19.313001Z 0 [ERROR] Incorrect definition of table performance_schema.events_waits_current: expected column 'NESTING_EVENT_TYPE' at position 15 to have 
type enum('TRANSACTION','STATEMENT','STAGE','WAIT', found type enum('STATEMENT','STAGE','WAIT').
2024-05-09T01:16:19.313055Z 0 [ERROR] Incorrect definition of table performance_schema.events_waits_history: expected column 'NESTING_EVENT_TYPE' at position 15 to have 
type enum('TRANSACTION','STATEMENT','STAGE','WAIT', found type enum('STATEMENT','STAGE','WAIT').
2024-05-09T01:16:19.313108Z 0 [ERROR] Incorrect definition of table performance_schema.events_waits_history_long: expected column 'NESTING_EVENT_TYPE' at position 15 to 
have type enum('TRANSACTION','STATEMENT','STAGE','WAIT', found type enum('STATEMENT','STAGE','WAIT').
2024-05-09T01:16:19.314109Z 0 [ERROR] Column count of performance_schema.threads is wrong. Expected 17, found 14. Created with MySQL 50645, now running 50743. Please use mysql_upgrade to fix this error.
2024-05-09T01:16:19.314148Z 0 [ERROR] Column count of performance_schema.events_stages_current is wrong. Expected 12, found 10. Created with MySQL 50645, now running 50743. Please use mysql_upgrade to fix this error.
2024-05-09T01:16:19.314187Z 0 [ERROR] Column count of performance_schema.events_stages_history is wrong. Expected 12, found 10. Created with MySQL 50645, now running 50743. Please use mysql_upgrade to fix this error.
2024-05-09T01:16:19.314227Z 0 [ERROR] Column count of performance_schema.events_stages_history_long is wrong. Expected 12, found 10. Created with MySQL 50645, now running 50743. Please use mysql_upgrade to fix this error.
2024-05-09T01:16:19.314315Z 0 [ERROR] Incorrect definition of table performance_schema.events_stages_summary_by_account_by_event_name: expected column 'USER' at position 0 to have type char(32), found type char(16).
2024-05-09T01:16:19.314356Z 0 [ERROR] Incorrect definition of table performance_schema.events_stages_summary_by_user_by_event_name: expected column 'USER' at position 0 to have type char(32), found type char(16).
2024-05-09T01:16:19.314482Z 0 [ERROR] Column count of performance_schema.events_statements_current is wrong. Expected 41, found 40. Created with MySQL 50645, now running 50743. Please use mysql_upgrade to fix this error.
2024-05-09T01:16:19.314543Z 0 [ERROR] Column count of performance_schema.events_statements_history is wrong. Expected 41, found 40. Created with MySQL 50645, now running 50743. Please use mysql_upgrade to fix this error.
2024-05-09T01:16:19.314602Z 0 [ERROR] Column count of performance_schema.events_statements_history_long is wrong. Expected 41, found 40. Created with MySQL 50645, now running 50743. Please use mysql_upgrade to fix this error.
2024-05-09T01:16:19.314687Z 0 [ERROR] Incorrect definition of table performance_schema.events_statements_summary_by_account_by_event_name: expected column 'USER' at position 0 to have type char(32), found type char(16).
2024-05-09T01:16:19.314735Z 0 [ERROR] Incorrect definition of table performance_schema.events_statements_summary_by_user_by_event_name: expected column 'USER' at position 0 to have type char(32), found type char(16).
2024-05-09T01:16:19.314898Z 0 [ERROR] Native table 'performance_schema'.'events_statements_summary_by_program' has the wrong structure
...
2024-05-09T01:16:19.315813Z 0 [Warning] Optional native table 'performance_schema'.'processlist' has the wrong structure or is missing.
2024-05-09T01:16:19.315885Z 0 [ERROR] Incorrect definition of table mysql.db: expected column 'User' at position 2 to have type char(32), found type char(16).
2024-05-09T01:16:19.315902Z 0 [ERROR] mysql.user has no `Event_priv` column at position 28
2024-05-09T01:16:19.316018Z 0 [ERROR] Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.
2024-05-09T01:16:19.321540Z 2 [Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:19.321572Z 2 [Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:19.326240Z 2 [Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:19.326258Z 2 [Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:38.989000Z 7 [Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:38.989027Z 7 [Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:39.248466Z 9 [Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:39.248513Z 9 [Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name.  Please run mysql_upgrade
2024-05-09T01:16:39.325119Z 10 [Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name.  Please run mysql_upgrade

As an additional note. The instances that did upgrade successfully have similar output when the db was brought up the first time before the mysql_upgrade. After doing the mysql_upgrade and when bringing up the server, we see this in the log instead:

	2024-05-09T05:44:42.623953Z WSREP_SST: [INFO] Streaming with xbstream
	2024-05-09T05:44:42.652609Z WSREP_SST: [INFO] Bypassing SST. Can work it through IST

But we don’t see that second line in the problem instance, it still does another SST streaming transfer which messes it up again. Is there a way to force it to bypass SST and only use IST?

I did another mysql_upgrade and then looked in the log on the node that the joiner node assigned as a donor. When handling the sync request, I see this:

2024-05-09 17:07:15 37046 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 1309316766)
2024-05-09 17:07:15 37046 [Note] WSREP: IST request: a7c17150-d8ff-11ed-b1a4-8fb18da769f9:1309300009-1309316697|tcp://10.35.214.40:7354
2024-05-09 17:07:15 37046 [Note] WSREP: IST first seqno 1309300010 not found from cache, falling back to SST
2024-05-09 17:07:15 37046 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-05-09 17:07:15 37046 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.35.214.40:7254/xtrabackup_sst//1' --socket '/apps/epic/var/cache/epic-gnsnet/pxc-metadb/xtradb.sock' --datadir '/apps/epic/var/cache/epic-gnsnet/pxc-metadb/' --defaults-file '/apps/epic/conf/gnsnet/pxc-metadb.conf' --defaults-group-suffix '' --mysqld-version '5.6.45-86.1-28.36'   '' --gtid 'a7c17150-d8ff-11ed-b1a4-8fb18da769f9:1309316766' '
2024-05-09 17:07:15 37046 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (2024-05-09 17:07:16)

So this line is what appears to make it always fall back to an SST transfer.

2024-05-09 17:07:15 37046 [Note] WSREP: IST first seqno 1309300010 not found from cache, falling back to SST

What does that mean exactly and how is it in this state? Is there a way to correct this somehow?