Full tmp directory --> node gets out of sync

Hello,
we are using percona xtradb cluster with 5 nodes.
Used software packages:

  • Percona-XtraDB-Cluster-client-56-5.6.48-28.40.1.el7.x86_64
  • percona-toolkit-3.2.0-1.el7.x86_64
  • percona-xtrabackup-2.3.10-1.el7.x86_64
  • Percona-XtraDB-Cluster-server-56-5.6.48-28.40.1.el7.x86_64
  • Percona-XtraDB-Cluster-shared-56-5.6.48-28.40.1.el7.x86_64
  • Percona-XtraDB-Cluster-galera-3-3.40-1.el7.x86_64
Some config parts
  • wsrep_provider_options = “gcs.fc_limit=600; gcs.fc_master_slave=YES; gcs.fc_factor=0.8; gcache.size=5G;”
  • wsrep_sst_method = xtrabackup-v2
  • wsrep_provider = /usr/lib64/libgalera_smm.so
  • wsrep_sync_wait = 1
  • tmpdir = /var/lib/mysql-tmp
We already moved the tmp directory to a exclusive percona directory. No other process is using this directory. But from time to time we run into the problem that one node gets out of sync caused by full tmp directory. The tmp directory runs out of space caused by a long time request with a full table joins but during the same time a normal update/insert statement for another table takes place. The update/insert statement also needs some space within tmp directory.
--------------------------------------------------
2020-08-03 13:50:36 4397 [ERROR] /usr/sbin/mysqld: Incorrect key file for table ‘/var/lib/mysql-tmp/#sql_112d_1.MYI’; try to repair it
2020-08-03 13:50:37 4397 [ERROR] Slave SQL: Could not execute Update_rows event on table project_pps.emailmessage; Error writing file ‘/var/lib/mysql-tmp/MLGNEvmX’ (Errcode: 28 - No space left on device), Error_code: 3; Error writing file ‘/var/lib/mysql-tmp/MLGNEvmX’ (errno: 28 - No space left on device), Error_code: 1026; the event’s master log FIRST, end_log_pos 95402, Error_code: 1026
Retrying 2th time
2020-08-03 13:50:37 4397 [ERROR] Slave SQL: Could not execute Update_rows event on table project_pps.emailmessage; Error writing file ‘/var/lib/mysql-tmp/MLGNEvmX’ (Errcode: 28 - No space left on device), Error_code: 3; Error writing file ‘/var/lib/mysql-tmp/MLGNEvmX’ (errno: 28 - No space left on device), Error_code: 1026; the event’s master log FIRST, end_log_pos 95402, Error_code: 1026

2020-08-03 13:50:37 4397 [ERROR] WSREP: Failed to apply trx 5919417306 4 times
2020-08-03 13:50:37 4397 [ERROR] WSREP: Node consistency compromised, aborting…
2020-08-03 13:50:37 4397 [Note] WSREP: gcomm: closing backend
-----------------------------------------------------
I know that extending tmp directory and also adjust retry count can help but in my opinion this can ony be a workaround.
My question regarding this problem: Is there a possibility to separate xtradb internal tmp directory from user request tmp directory? 

Thx for you help! :slight_smile:

You have two options: 1) Fix the query. Add some indexes so that the full table join does not require so much disk temp space.  2) upgrade to PXC 8.0 which has a more optimal temp table engine and supports separate temp dir locations.