Environment
-
PXC Version: 8.0.42-33.1 Percona XtraDB Cluster (GPL), Release rel33, Revision 6673f8e
-
WSREP Version: 26.1.4.3
-
OS: Ubuntu 24.04
-
Cluster: 3-node setup
-
Data Size: ~161GB production data
Problem Summary
After a 3-node PXC cluster failure, we successfully bootstrapped pxc3 which is now serving production traffic normally. However, we discovered severe information_schema corruption that blocks all administrative operations and prevents cluster expansion via SST.
Production queries work perfectly - the issue is purely with metadata access.
Symptoms
mysql> SHOW DATABASES;
ERROR 1356 (HY000): View 'information_schema.SCHEMATA' references invalid table(s)
or column(s) or function(s) or definer/invoker of view lack rights to use them
mysql> SELECT COUNT(*) FROM information_schema.tables WHERE engine = 'InnoDB';
ERROR 1356 (HY000): View 'information_schema.TABLES' references invalid table(s)...
What’s Broken:
-
All information_schema views (TABLES, SCHEMATA, COLUMNS, etc.)
-
mysqldump (requires information_schema access)
-
User management (SHOW GRANTS fails)
-
SST donor operations (xtrabackup needs information_schema)
-
Cluster expansion (cannot add pxc1/pxc2 back)
What Works:
-
Direct table queries:
SELECT * FROM database.table -
Production application queries
-
SHOW STATUScommands -
Data integrity seems intact
What We’ve Tried
1. mysql_upgrade --force
# Modified systemd to force upgrade
ExecStart=/usr/sbin/mysqld --upgrade=FORCE $EXTRA_ARGS $_WSREP_START_POSITION
# Result in error log:
2026-01-30T00:30:25.554808Z 6 [System] [MY-013381] [Server]
Server upgrade from '80042' to '80042' started.
2026-01-30T00:30:26.664702Z 6 [System] [MY-013381] [Server]
Server upgrade from '80042' to '80042' completed.
# But corruption persists - upgrade ran but didn't fix it
2. mysqldump with Various Options
# All variations fail with same error
mysqldump --skip-opt --quick --single-transaction --databases [list]
# Error: Couldn't execute 'SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES...'
mysqldump --no-tablespaces --skip-lock-tables [dbname]
# Error: Same information_schema error
3. mysqlcheck Repair
mysqlcheck -u root -p --auto-repair --all-databases
# Error: Couldn't get table list for database mysql:
# View 'information_schema.TABLES' references invalid table(s)
4. Direct System Table Repair
SET SESSION sql_log_bin=0;
REPAIR TABLE mysql.user;
REPAIR TABLE mysql.db;
FLUSH PRIVILEGES;
# All complete successfully but information_schema still broken
mysql> REPAIR TABLE mysql.user;
+------------+--------+----------+---------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------+--------+----------+---------------------------------------------------------+
| mysql.user | repair | note | The storage engine for the table doesn't support repair |
+------------+--------+----------+---------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
5. Physical File Copy for Slave Setup
-
Copied 162GB data dir to new server
-
Created fresh mysql system database
-
User databases not discovered (data dictionary mismatch)
-
Tried restoring old mysql directory - corruption copied over
Current Situation
pxc3 Status (Donor):
mysql> SHOW STATUS WHERE Variable_name IN
('wsrep_cluster_status', 'wsrep_cluster_size',
'wsrep_local_state_comment', 'wsrep_ready');
+---------------------------+---------+
| Variable_name | Value |
+---------------------------+---------+
| wsrep_cluster_size | 1 |
| wsrep_cluster_status | Primary |
| wsrep_local_state_comment | Synced |
| wsrep_ready | ON |
+---------------------------+---------+
SST Failure Pattern (when trying to add pxc1):
2026-01-29T01:25:25.766624Z 0 [Note] [WSREP-SST]
ERROR 1142 (42000) at line 1: SELECT command denied to user
'mysql.pxc.sst.user'@'localhost' for table 'keyring_component_status'
2026-01-29T01:25:25.791119Z 0 [Warning] [Galera] 1.0 (pxc3):
State transfer to 0.0 (pxc1) failed: Not in primary component
We’ve created SST user with all permissions but role grants don’t activate properly due to information_schema issues.
Relevant Configuration
my.cnf (pxc3):
[mysqld]
server-id=158
wsrep_cluster_name=yellowdot_pxc_cluster
wsrep_cluster_address=gcomm://pxc1,pxc2,pxc3 # I am using actual IPs here
wsrep_node_address=pxc3 # I am using actual IP here
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth='mysql.pxc.sst.user:'
pxc-encrypt-cluster-traffic=OFF
wsrep_provider_options="pc.ignore_sb=true"
Questions
-
Is there a way to repair the data dictionary (mysql.ibd) directly without relying on information_schema?
-
Can we force xtrabackup SST to skip information_schema queries and just copy physical files? We have
--no-version-checkbut xtrabackup still checks information_schema internally. -
Can we manually export/import the data dictionary from the physical files to create a clean slave, then use that slave as SST donor?
-
Is there a recovery tool that can rebuild information_schema views from the underlying InnoDB data dictionary without needing information_schema access itself?
-
Could we use rsync/physical file copy as SST method given that xtrabackup requires information_schema? Our files are crash-consistent since we’re running single-node.
Our Current Plan (if no solution)
-
Keep pxc3 running as-is (production stable)
-
Create slave node and make it replicate data from pxc3
-
Export data via application-level tools
-
Failover traffic to new slave when replication is in sync , allowing us to
-
rebuild the 3-node prod cluster from scratch
However, we’d prefer to recover the existing cluster if there’s a way to work around this information_schema corruption.
Additional Context
-
This happened after 2 nodes left the cluster and went down due to unknown reasons at the moment.
-
pxc2 was last to go down and had most current data
-
we brought it down and up as a bootstrap, pxc3 successfully joined the cluster
-
pxc1 attempt to join the cluster usong pxc2 failed.
-
when we forced pxc1 to joing the cluster using pxc3 - sst process initially started but got stuck in a loop where data was deleted from /var/lib/mysql/ and would grow till 42GB, before it gets all deleted again when network error occurs on donor during sync. So we lost data completely on pxc1.
-
brought down pxc2 bootstrap and started pxc2 normally, and it failed to join the cluster. Its been down since.
-
We have physical file backups but they contain the same corruption
-
InnoDB tablespaces (.ibd files) appear intact and readable
-
Only the metadata layer is corrupted
Any guidance would be greatly appreciated! We’re particularly interested in:
-
Alternative SST methods that bypass information_schema
-
Data dictionary repair procedures
-
Manual cluster recovery techniques
-
Similar experiences with this type of corruption
Thank you!