Percona XtraDB, Question to the experts!!! ;)

Hi everyone!!

I have a Percona XtraCluster with 2 nodes on Primary mode each one. (I know taht is preferable 3nodes, but the project needs only 2 nodes).
The rare situation that I have tested is this:

Step 1 - I start the node A with “/etc/init.d/mysql bootstrap-pxc”.
Step 2 - I start the node B with “/etc/init.d/mysql start” (Both nodes are synced).
Step 3 - I reboot the machine with node A.
Step 4 - I modify the DataBase Insert/updates… by node B (is the unique up).
Step 5 - I stopt the node B with “/etc/init.d/mysql stop”.
Step 6 - I start the node A with “/etc/init.d/mysql bootstrap-pxc”.
Step 7 - I start the node B with “/etc/init.d/mysql start”…and the log-error is:
MySQL (Percona XtraDB Cluster) is not running, but lock fil[FALLÓ]/lock/subsys/mysql) exists
Starting MySQL (Percona XtraDB Cluster)…[FALLÓ]…The server quit without updating PID file (/DATAMYSQL/data1/mysql.pid).
MySQL (Percona XtraDB Cluster) server startup failed! [FALLÓ]

I know…this situation is very very strange, but the ‘Percona Experts’ can say me if it possible conncet 2 nodes and would happend?? :stuck_out_tongue:

Best Regards.

This is inconclusive, you need to attach the error log from both servers to be able to analyze what is wrong.

Hi jrivera,

I attach you both server’s log.
LOG_SERVER_A
(Tnahks for all) :wink:

LOG server A

2015-08-10 13:57:36 5579 [Note] WSREP: STATE EXCHANGE: sent state msg: fe083d53-3f56-11e5-ba02-0a7273d42abe
2015-08-10 13:57:36 5579 [Note] WSREP: STATE EXCHANGE: got state msg: fe083d53-3f56-11e5-ba02-0a7273d42abe from 0 (myinfprese02)
2015-08-10 13:57:36 5579 [Note] WSREP: STATE EXCHANGE: got state msg: fe083d53-3f56-11e5-ba02-0a7273d42abe from 1 (myinfprema02)
2015-08-10 13:57:36 5579 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 2731,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = d829d797-3b46-11e5-9872-8a946f5b1e06
2015-08-10 13:57:36 5579 [Note] WSREP: Flow-control interval: [23, 23]
2015-08-10 13:57:36 5579 [Note] WSREP: New cluster view: global state: d829d797-3b46-11e5-9872-8a946f5b1e06:2731, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
2015-08-10 13:57:36 5579 [ERROR] WSREP: Process completed with error: /usr/local/mysql/bin/check_control.sh --status Synced --uuid d829d797-3b46-11e5-9872-8a946f5b1e06 --primary yes --index 1 --members fd6ee2ea-3f56-11e5-bcec-1fbbf46849cc/myinfprese02/10.252.141.83:3306,fd8eae56-3f55-11e5-bce2-8ea890a0b6d2/myinfprema02/10.252.141.82:3306: 2 (No such file or directory)
2015-08-10 13:57:36 5579 [ERROR] WSREP: Notification command failed: 2 (No such file or directory): “/usr/local/mysql/bin/check_control.sh --status Synced --uuid d829d797-3b46-11e5-9872-8a946f5b1e06 --primary yes --index 1 --members fd6ee2ea-3f56-11e5-bcec-1fbbf46849cc/myinfprese02/10.252.141.83:3306,fd8eae56-3f55-11e5-bce2-8ea890a0b6d2/myinfprema02/10.252.141.82:3306”
2015-08-10 13:57:36 5579 [Note] WSREP: REPL Protocols: 6 (3, 2)
2015-08-10 13:57:36 5579 [Note] WSREP: Service thread queue flushed.
2015-08-10 13:57:36 5579 [Note] WSREP: Assign initial position for certification: 2731, protocol version: 3
2015-08-10 13:57:36 5579 [Note] WSREP: Service thread queue flushed.
2015-08-10 13:57:37 5579 [Note] WSREP: forgetting fd6ee2ea (tcp://10.252.141.xx:4567)
2015-08-10 13:57:37 5579 [Note] WSREP: Node fd8eae56 state prim
2015-08-10 13:57:37 5579 [Note] WSREP: save pc into disk
2015-08-10 13:57:37 5579 [Note] WSREP: forgetting fd6ee2ea (tcp://10.252.141.xx:4567)
2015-08-10 13:57:37 5579 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2015-08-10 13:57:37 5579 [Note] WSREP: STATE_EXCHANGE: sent state UUID: fea255e2-3f56-11e5-b5a3-cf948f807c20
2015-08-10 13:57:37 5579 [Note] WSREP: STATE EXCHANGE: sent state msg: fea255e2-3f56-11e5-b5a3-cf948f807c20
2015-08-10 13:57:37 5579 [Note] WSREP: STATE EXCHANGE: got state msg: fea255e2-3f56-11e5-b5a3-cf948f807c20 from 0 (myinfprema02)
2015-08-10 13:57:37 5579 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 2,
members = 1/1 (joined/total),
act_id = 2731,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = d829d797-3b46-11e5-9872-8a946f5b1e06
2015-08-10 13:57:37 5579 [Note] WSREP: Flow-control interval: [16, 16]
2015-08-10 13:57:37 5579 [Note] WSREP: New cluster view: global state: d829d797-3b46-11e5-9872-8a946f5b1e06:2731, view# 3: Primary, number of nodes: 1,my index: 0, protocol version 3
2015-08-10 13:57:37 5579 [ERROR] WSREP: Process completed with error: /usr/local/mysql/bin/check_control.sh --status Synced --uuid d829d797-3b46-11e5-9872-8a946f5b1e06 --primary yes --index 0 --members fd8eae56-3f55-11e5-bce2-8ea890a0b6d2/myinfprema02/10.252.141.82:3306: 1 (Operation not permitted)
2015-08-10 13:57:37 5579 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): “/usr/local/mysql/bin/check_control.sh --status Synced --uuid d829d797-3b46-11e5-9872-8a946f5b1e06 --primary yes --index 0 --members fd8eae56-3f55-11e5-bce2-8ea890a0b6d2/myinfprema02/10.252.141.82:3306”
2015-08-10 13:57:37 5579 [Note] WSREP: REPL Protocols: 6 (3, 2)
2015-08-10 13:57:37 5579 [Note] WSREP: Service thread queue flushed.
2015-08-10 13:57:37 5579 [Note] WSREP: Assign initial position for certification: 2731, protocol version: 3
2015-08-10 13:57:37 5579 [Note] WSREP: Service thread queue flushed.
2015-08-10 13:57:38 5579 [Note] WSREP: (fd8eae56, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2015-08-10 13:57:42 5579 [Note] WSREP: cleaning up fd6ee2ea (tcp://10.252.141.xx:4567)

I Attach the LOG server B

150810 13:57:35 mysqld_safe Starting mysqld daemon with databases from /DATAMYSQL/data1
150810 13:57:35 mysqld_safe Skipping wsrep-recover for d829d797-3b46-11e5-9872-8a946f5b1e06:2732 pair
150810 13:57:35 mysqld_safe Assigning d829d797-3b46-11e5-9872-8a946f5b1e06:2732 to wsrep_start_position
2015-08-10 13:57:35 0 [Note] WSREP: wsrep_start_position var submitted: ‘d829d797-3b46-11e5-9872-8a946f5b1e06:2732’
2015-08-10 13:57:35 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see do cumentation for more details).
2015-08-10 13:57:35 30483 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-08-10 13:57:35 30483 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/galera3/libgalera_smm.so’
2015-08-10 13:57:35 30483 [Note] WSREP: wsrep_load(): Galera 3.6(r3a949e6) by Codership Oy <info@codership.com> loaded successfully.
2015-08-10 13:57:35 30483 [Note] WSREP: CRC-32C: using hardware acceleration.
2015-08-10 13:57:35 30483 [Note] WSREP: Found saved state: d829d797-3b46-11e5-9872-8a946f5b1e06:2732
2015-08-10 13:57:35 30483 [Note] WSREP: Passing config to GCS: base_host = 10.252.141.83; base_port = 4567; cert.log_conflicts = no; debug = no; evs.in active_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stat s_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /DATAMYSQL/data1/; gcache.k eep_pages_size = 0; gcache.mem_size = 0; gcache.name = /DATAMYSQL/data1//galera.cache; gcache.page_size = 1G; gcache.size = 4G; gcs.fc_debug = 0; gcs.f c_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 922337203685 4775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc .ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.w
2015-08-10 13:57:35 30483 [Note] WSREP: Service thread queue flushed.
2015-08-10 13:57:35 30483 [Note] WSREP: Assign initial position for certification: 2732, protocol version: -1
2015-08-10 13:57:35 30483 [Note] WSREP: wsrep_sst_grab()
2015-08-10 13:57:35 30483 [Note] WSREP: Start replication
2015-08-10 13:57:35 30483 [Note] WSREP: Setting initial position to d829d797-3b46-11e5-9872-8a946f5b1e06:2732
2015-08-10 13:57:35 30483 [Note] WSREP: protonet asio version 0
2015-08-10 13:57:35 30483 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
2015-08-10 13:57:35 30483 [Note] WSREP: backend: asio
2015-08-10 13:57:35 30483 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
2015-08-10 13:57:35 30483 [Note] WSREP: restore pc from disk failed
2015-08-10 13:57:35 30483 [Note] WSREP: GMCast version 0
2015-08-10 13:57:35 30483 [Note] WSREP: (fd6ee2ea, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
2015-08-10 13:57:35 30483 [Note] WSREP: (fd6ee2ea, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
2015-08-10 13:57:35 30483 [Note] WSREP: EVS version 0
2015-08-10 13:57:35 30483 [Note] WSREP: PC version 0
2015-08-10 13:57:35 30483 [Note] WSREP: gcomm: connecting to group ‘mysqlclusterpre’, peer ‘10.252.141.82:,10.252.141.83:’
2015-08-10 13:57:35 30483 [Warning] WSREP: (fd6ee2ea, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.252.141.83:4567’ points to own listening address, blackli sting
2015-08-10 13:57:35 30483 [Note] WSREP: (fd6ee2ea, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.252.141.83:4567’ pointing to uuid fd6ee2ea is blacklisted, s kipping
2015-08-10 13:57:35 30483 [Note] WSREP: (fd6ee2ea, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers:
2015-08-10 13:57:36 30483 [Note] WSREP: declaring fd8eae56 at tcp://10.252.141.82:4567 stable
2015-08-10 13:57:36 30483 [Note] WSREP: Node fd8eae56 state prim
2015-08-10 13:57:36 30483 [Note] WSREP: save pc into disk
2015-08-10 13:57:36 30483 [Note] WSREP: gcomm: connected
2015-08-10 13:57:36 30483 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2015-08-10 13:57:36 30483 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)
2015-08-10 13:57:36 30483 [Note] WSREP: Opened channel ‘mysqlclusterpre’
2015-08-10 13:57:36 30483 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2015-08-10 13:57:36 30483 [Note] WSREP: Waiting for SST to complete.
2015-08-10 13:57:36 30483 [Note] WSREP: STATE_EXCHANGE: sent state UUID: fe083d53-3f56-11e5-ba02-0a7273d42abe
2015-08-10 13:57:36 30483 [Note] WSREP: STATE EXCHANGE: sent state msg: fe083d53-3f56-11e5-ba02-0a7273d42abe
2015-08-10 13:57:36 30483 [Note] WSREP: STATE EXCHANGE: got state msg: fe083d53-3f56-11e5-ba02-0a7273d42abe from 0 (myinfprese02)
2015-08-10 13:57:36 30483 [Note] WSREP: STATE EXCHANGE: got state msg: fe083d53-3f56-11e5-ba02-0a7273d42abe from 1 (myinfprema02)
2015-08-10 13:57:36 30483 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 2731,
last_appl. = -1,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = d829d797-3b46-11e5-9872-8a946f5b1e06
2015-08-10 13:57:36 30483 [Note] WSREP: Flow-control interval: [23, 23]
2015-08-10 13:57:36 30483 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 2731)
2015-08-10 13:57:36 30483 [Note] WSREP: Closing send monitor…
2015-08-10 13:57:36 30483 [Note] WSREP: Closed send monitor.
2015-08-10 13:57:36 30483 [Note] WSREP: gcomm: terminating thread
2015-08-10 13:57:36 30483 [Note] WSREP: gcomm: joining thread
2015-08-10 13:57:36 30483 [Note] WSREP: gcomm: closing backend
2015-08-10 13:57:37 30483 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2015-08-10 13:57:37 30483 [Note] WSREP: gcomm: closed
2015-08-10 13:57:37 30483 [Note] WSREP: Flow-control interval: [16, 16]
2015-08-10 13:57:37 30483 [Note] WSREP: Received NON-PRIMARY.
2015-08-10 13:57:37 30483 [Note] WSREP: Shifting PRIMARY → OPEN (TO: 2731)
2015-08-10 13:57:37 30483 [Note] WSREP: Received self-leave message.
2015-08-10 13:57:37 30483 [Note] WSREP: Flow-control interval: [0, 0]
2015-08-10 13:57:37 30483 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2015-08-10 13:57:37 30483 [Note] WSREP: Shifting OPEN → CLOSED (TO: 2731)
2015-08-10 13:57:37 30483 [Note] WSREP: RECV thread exiting 0: Success
2015-08-10 13:57:37 30483 [Note] WSREP: recv_thread() joined.
2015-08-10 13:57:37 30483 [Note] WSREP: Closing replication queue.
2015-08-10 13:57:37 30483 [Note] WSREP: Closing slave action queue.
2015-08-10 13:57:37 30483 [ERROR] WSREP: Local state seqno (2732) is greater than group seqno (2731): states diverged. Aborting to avoid potential data loss. Remove ‘/DATAMYSQL/data1//grastate.dat’ file and restart if you wish to continue. (FATAL)
at galera/src/replicator_str.cpp:state_transfer_required():33
2015-08-10 13:57:37 30483 [Note] WSREP: applier thread exiting (code:8)
2015-08-10 13:57:37 30483 [ERROR] Aborting

2015-08-10 13:57:39 30483 [Note] WSREP: Service disconnected.
2015-08-10 13:57:39 30483 [Note] WSREP: rollbacker thread exiting
2015-08-10 13:57:40 30483 [Note] WSREP: Some threads may fail to exit.
2015-08-10 13:57:40 30483 [Note] Binlog end
2015-08-10 13:57:40 30483 [Note] /usr/local/mysql/bin/mysqld: Shutdown complete

Error in my_thread_global_end(): 1 threads didn’t exit
150810 13:57:45 mysqld_safe mysqld from pid file /DATAMYSQL/data1/mysql.pid ended