Third node does not start MySQL

Hello,

Three-node cluster ( Percona XtraDB Cluster (GPL), Release rel37.0, Revision 853, WSREP version 25.11, wsrep_25.11.r4027), running HPELinux (Debian), two nodes runs OK, but the third is unable to start database.

Happened after a file system full, apparently withiut any modification on the previous running configuration (previous system managerresigned and I’m trying to understand the issues).

Here below is the error log.

Any hint about restart succesfully the third node?

Many thanks in advance for any help!

Max

State from failing machine:

root@helion-cp1-c1-m1-mgmt:/etc/mysql# cat /var/lib/mysql/grastate.dat

GALERA saved state

version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
cert_index:

  • Logfile:

170201 08:09:01 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170201 08:23:36 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170201 08:23:36 mysqld_safe WSREP: Running position recovery with --log_error=‘/var/lib/mysql/wsrep_recovery.Qbt51O’ --pid-file=‘/var/lib/mysql/helion-cp1-c1-m1-mgmt-recover.pid’
170201 8:23:36 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
170201 8:23:36 [Warning] The syntax ‘–log-slow-queries’ is deprecated and will be removed in a future release. Please use ‘–slow-query-log’/‘–slow-query-log-file’ instead.
170201 08:23:40 mysqld_safe WSREP: Recovered position 9c3eced4-70eb-11e6-9bc5-6fd14ff56422:62955021
170201 8:23:40 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
170201 8:23:40 [Warning] The syntax ‘–log-slow-queries’ is deprecated and will be removed in a future release. Please use ‘–slow-query-log’/‘–slow-query-log-file’ instead.
170201 8:23:40 [Note] WSREP: wsrep_start_position var submitted: ‘9c3eced4-70eb-11e6-9bc5-6fd14ff56422:62955021’
170201 8:23:40 [Note] WSREP: Read nil XID from storage engines, skipping position init
170201 8:23:40 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib/libgalera_smm.so’
170201 8:23:40 [Note] WSREP: wsrep_load(): Galera 2.12(r318911d) by Codership Oy <info@codership.com> loaded successfully.
170201 8:23:40 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
170201 8:23:40 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.
170201 8:23:40 [Note] WSREP: Passing config to GCS: base_host = 10.3.1.174; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://10.3.1.174:4567; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S;
170201 8:23:40 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
170201 8:23:40 [Note] WSREP: wsrep_sst_grab()
170201 8:23:40 [Note] WSREP: Start replication
170201 8:23:40 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
170201 8:23:40 [Note] WSREP: protonet asio version 0
170201 8:23:40 [Note] WSREP: backend: asio
170201 8:23:40 [Note] WSREP: GMCast version 0
170201 8:23:40 [Note] WSREP: (bc5caf9a, ‘tcp://10.3.1.174:4567’) listening at tcp://10.3.1.174:4567
170201 8:23:40 [Note] WSREP: (bc5caf9a, ‘tcp://10.3.1.174:4567’) multicast: , ttl: 1
170201 8:23:40 [Note] WSREP: EVS version 0
170201 8:23:40 [Note] WSREP: PC version 0
170201 8:23:40 [Note] WSREP: gcomm: connecting to group ‘Vs01zdJ71’, peer ‘helion-cp1-c1-m1-mgmt:,helion-cp1-c1-m2-mgmt:,helion-cp1-c1-m3-mgmt:’
170201 8:23:40 [Note] WSREP: (bc5caf9a, ‘tcp://10.3.1.174:4567’) turning message relay requesting on, nonlive peers:
170201 8:23:40 [Note] WSREP: declaring ad7803c2 at tcp://10.3.1.178:4567 stable
170201 8:23:40 [Note] WSREP: declaring e2ca1084 at tcp://10.3.1.177:4567 stable
170201 8:23:40 [Note] WSREP: Node ad7803c2 state prim
170201 8:23:40 [Note] WSREP: gcomm: connected
170201 8:23:40 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
170201 8:23:40 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)
170201 8:23:40 [Note] WSREP: Opened channel ‘Vs01zdJ71’
170201 8:23:40 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
170201 8:23:40 [Note] WSREP: Waiting for SST to complete.
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: sent state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: got state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e from 0 (helion-cp1-c1-m3-mgmt)
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: got state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e from 2 (helion-cp1-c1-m2-mgmt)
170201 8:23:40 [Note] WSREP: STATE EXCHANGE: got state msg: 34aed4c9-e868-11e6-8acf-1eb66790547e from 1 (helion-cp1-c1-m1-mgmt)
170201 8:23:40 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 16,
members = 2/3 (joined/total),
act_id = 66631754,
last_appl. = -1,
protocols = 0/4/3 (gcs/repl/appl),
group UUID = 9c3eced4-70eb-11e6-9bc5-6fd14ff56422
170201 8:23:40 [Note] WSREP: Flow-control interval: [28, 28]
170201 8:23:40 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 66631754)
170201 8:23:40 [Note] WSREP: State transfer required:
Group state: 9c3eced4-70eb-11e6-9bc5-6fd14ff56422:66631754
Local state: 00000000-0000-0000-0000-000000000000:-1
170201 8:23:40 [Note] WSREP: New cluster view: global state: 9c3eced4-70eb-11e6-9bc5-6fd14ff56422:66631754, view# 17: Primary, number of nodes: 3, my index: 1, protocol version 3
170201 8:23:40 [Warning] WSREP: Gap in state sequence. Need state transfer.
170201 8:23:40 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.3.1.174’ --auth ‘sst_user:JgEfDXKG1F’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/mysql/my.cnf’ --parent ‘22943’’
WSREP_SST: [INFO] Streaming with tar (20170201 08:23:40.717)
WSREP_SST: [INFO] Using socat as streamer (20170201 08:23:40.719)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | tar xfi - --recursive-unlink -h; RC=( ${PIPESTATUS[@]} ) (20170201 08:23:40.730)
2017/02/01 08:23:40 socat[23111] E bind(12, {AF=2 0.0.0.0:4444}, 16): Address already in use
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
170201 8:23:40 [Note] WSREP: Prepared SST request: xtrabackup|10.3.1.174:4444/xtrabackup_sst
170201 8:23:40 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170201 8:23:40 [Note] WSREP: Assign initial position for certification: 66631754, protocol version: 2
170201 8:23:40 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (9c3eced4-70eb-11e6-9bc5-6fd14ff56422): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():447. IST will be unavailable.
170201 8:23:40 [Note] WSREP: Node 1 (helion-cp1-c1-m1-mgmt) requested state transfer from ‘any’. Selected 0 (helion-cp1-c1-m3-mgmt)(SYNCED) as donor.
170201 8:23:40 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 66631759)
170201 8:23:40 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 1 2 (20170201 08:23:40.757)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20170201 08:23:40.759)
WSREP_SST: [INFO] Removing the sst_in_progress file (20170201 08:23:40.761)
170201 8:23:40 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.3.1.174’ --auth ‘sst_user:JgEfDXKG1F’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/mysql/my.cnf’ --parent ‘22943’: 32 (Broken pipe)
170201 8:23:40 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
170201 8:23:40 [ERROR] WSREP: SST failed: 32 (Broken pipe)
170201 8:23:40 [ERROR] Aborting

170201 8:23:41 [Warning] WSREP: 0 (helion-cp1-c1-m3-mgmt): State transfer to 1 (helion-cp1-c1-m1-mgmt) failed: -1 (Operation not permitted)
170201 8:23:41 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.
170201 8:23:41 [Note] WSREP: gcomm: terminating thread
170201 8:23:41 [Note] WSREP: gcomm: joining thread
170201 8:23:41 [Note] WSREP: gcomm: closing backend
170201 8:23:41 [Note] WSREP: gcomm: closed
170201 8:23:41 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted
170201 08:23:41 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
root@helion-cp1-c1-m1-mgmt:/etc/mysql#

This is what I can see from the log (socat is reporting that the address used to receive the SST is in use):

WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | tar xfi - --recursive-unlink -h; RC=( ${PIPESTATUS[@]} ) (20170201 08:23:40.730)
2017/02/01 08:23:40 socat[23111] E bind(12, {AF=2 0.0.0.0:4444}, 16): Address already in use
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors