- Hell
I’m testing PXC for MySQL 8. I have been able to configure and init the cluster with 5 nodes. During the testing, I’m having some issues.
Testing procedure:
Init the cluster with 5 nodes → OK
- Restart the bootstrapped node (1st) to normal mode -> OK
- In the log i can see that the node is still in bootstrap mode, but I'm starting it normally (systemctl start mysql)
- Sop the first node and start it again -> FAIL (see below)
2020-10-23T09:14:30.882507Z 0 [Warning] [MY-000000] [WSREP] Node is running in bootstrap/initialize mode. Disabling pxc_strict_mode checks
2020-10-24T17:05:15.524264Z 2 [Note] [MY-000000] [WSREP] Server status change connected -> joiner
2020-10-24T17:05:15.524288Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2020-10-24T17:05:15.524373Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.5.70' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '4556' --mysqld-version '8.0.20-11.3' '' )
2020-10-24T17:05:16.109763Z 2 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|192.168.5.70:4444/xtrabackup_sst//1
2020-10-24T17:05:16.109810Z 2 [Note] [MY-000000] [Galera] Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2020-10-24T17:05:16.109898Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2020-10-24T17:05:16.109961Z 2 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
2020-10-24T17:05:16.109994Z 2 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
2020-10-24T17:05:16.110012Z 2 [Note] [MY-000000] [Galera] Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: 450e8f4b-1510-11eb-91bd-a6ab83d20e9a
2020-10-24T17:05:16.110030Z 2 [Note] [MY-000000] [Galera] ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 3627, STRv: 3
2020-10-24T17:05:16.110168Z 2 [Note] [MY-000000] [Galera] IST receiver addr using tcp://192.168.5.70:4568
2020-10-24T17:05:16.110266Z 2 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-3627, listening at: tcp://192.168.5.70:4568
2020-10-24T17:05:18.019992Z 0 [Note] [MY-000000] [Galera] (158e52b0, 'tcp://0.0.0.0:4567') turning message relay requesting off
2020-10-24T17:06:55.887925Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2020-10-24T17:06:55.888135Z 0 [ERROR] [MY-000000] [WSREP-SST] Possible timeout in receving first data from donor in gtid/keyring stage
2020-10-24T17:06:55.888237Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1108
2020-10-24T17:06:55.888320Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2020-10-24T17:06:55.888387Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2020-10-24T17:06:55.905996Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.5.70' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '4556' --mysqld-version '8.0.20-11.3' '' : 32 (Broken pipe)
2020-10-24T17:06:55.906143Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2020-10-24T17:06:55.90629 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2020-10-24T17:06:55.906571Z 3 [Note] [MY-000000] [Galera] Processing SST received
2020-10-24T17:06:55.906676Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2020-10-24T17:06:55.906820Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2020-10-24T17:06:55.906920Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2020-10-24T17:06:55.907042Z 3 [Note] [MY-000000] [Galera] Closing send monitor...
2020-10-24T17:06:55.907256Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2020-10-24T17:06:55.907353Z 2 [ERROR] [MY-000000] [Galera] Requesting state transfer failed: -77(File descriptor in bad state)
2020-10-24T17:06:55.907541Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2020-10-24T17:06:55.907648Z 2 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 77 (File descriptor in bad state). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2020-10-24T17:06:55.907881Z 2 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2020-10-24T17:06:55.907978Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2020-10-24T17:06:55.908120Z 3 [Note] [MY-000000] [Galera] gcomm: closing backend
2020-10-24T17:06:55.908197Z 2 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated.
2020-10-24T17:06:55.908327Z 2 [Note] [MY-000000] [WSREP] Initiating SST cancellation
Please see configuration file of the first node:
[mysqld]
server-id = 1
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
log-error = /var/log/mysql/error.log
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
# LOGGING #
# skip-log-bin
binlog_expire_logs_seconds=604800
log_error = /var/log/mysql/error.log
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
log_slow_rate_type = query
long_query_time = 3
log_slow_rate_limit = 250
log_slow_verbosity = full
slow_query_log_use_global_control = all
log-queries-not-using-indexes = 1
join-buffer-size = 1M
innodb_doublewrite=0
# Metrics
performance_schema=ON
# INNODB SETTINGS
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 8G
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 4G # max 80% of available memory
innodb-buffer-pool-instances = 4 # Should be 1 per 1G of memory
innodb-flush-log-at-trx-commit = 2
innodb-thread-concurrency = 64
# CACHES AND LIMITS
tmp-table-size = 32M
max-heap-table-size = 32M
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 1024
table-open-cache = 2048
# TUNE UPS
# Skip reverse DNS lookup of clients
skip-name-resolve
userstat = 1
thread_pool_size = 16
thread_handling = 2 # Number of CPU cores
# SECURITY
# connection-control=FORCE_PLUS_PERMANENT
# connection-control-failed-login-attempts=FORCE_PLUS_PERMANENT
# connection_control_failed_connections_threshold=5 # Number of failed login attempts
# connection_control_min_connection_delay=5000 # Added delay after failed login attempts
# CLUSTER SETTINGS
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.x.xx,192.168.x.63,192.168.x.xx,192.168.x.xx,192.168.x.xx
binlog_format=ROW
wsrep_slave_threads=16
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_cluster_name=motv_test
wsrep_node_name=sql1
wsrep_node_address=192.168.x.xx
pxc_strict_mode=DISABLED
wsrep_sst_method=xtrabackup-v2
pxc-encrypt-cluster-traffic=OFF
wsrep_provider_options="gcache.size=3G;gcache.page _size=1G;gcache.recover=yes"
- What I did:
- Changed timeouts in systemd service, not helped
- I tried edit service_startup_timeout=900 variable in /usr/bin/mysql-systemd. Changed it to 9000 - not helped
Both of points are from google, so I don’t have any idea how to fix this issue.
Has someone the same or at least similar issues or is it a bug?
Thanks a lot for answers and tips.
Martin