Error joining to a xtradb cluster

Hi I’m testing a cluster Xtradb 5.5 on debian 64 from deb packages.

Master goes ok and the first node goes ok, but the last node aborts because can’t complete state transfer:

Logs ahead …

120531 19:08:15 [Note] WSREP: Read nil XID from storage engines, skipping position init120531 19:08:15 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/’120531 19:08:15 [Note] WSREP: wsrep_load(): Galera 2.1dev(r112) by Codership Oy <> loaded succesfully.120531 19:08:15 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1120531 19:08:15 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.120531 19:08:15 [Note] WSREP: Passing config to GCS: base_host =; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3120531 19:08:15 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1120531 19:08:15 [Note] WSREP: wsrep_sst_grab()120531 19:08:15 [Note] WSREP: Start replication120531 19:08:15 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1120531 19:08:15 [Note] WSREP: protonet asio version 0120531 19:08:15 [Note] WSREP: backend: asio120531 19:08:15 [Note] WSREP: GMCast version 0120531 19:08:15 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, ‘tcp://’) listening at tcp:// 19:08:15 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, ‘tcp://’) multicast: , ttl: 1120531 19:08:15 [Note] WSREP: EVS version 0120531 19:08:15 [Note] WSREP: PC version 0120531 19:08:15 [Note] WSREP: gcomm: connecting to group ‘perconatest’, peer ''120531 19:08:15 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, ‘tcp://’) turning message relay requesting on, nonlive peers: tcp:// 120531 19:08:16 [Note] WSREP: declaring 89f786ac-ab4d-11e1-0800-83b84a64d80b stable120531 19:08:16 [Note] WSREP: declaring 968b58d2-ab2d-11e1-0800-f2cbfe2d8ed2 stable120531 19:08:16 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, ‘tcp://’) turning message relay requesting off120531 19:08:16 [Note] WSREP: view(view_id(PRIM,36412058-ab43-11e1-0800-85a41e0edd94,13) memb { 36412058-ab43-11e1-0800-85a41e0edd94, 89f786ac-ab4d-11e1-0800-83b84a64d80b, 968b58d2-ab2d-11e1-0800-f2cbfe2d8ed2,} joined {} left {} partitioned {})120531 19:08:16 [Note] WSREP: gcomm: connected120531 19:08:16 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636120531 19:08:16 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)120531 19:08:16 [Note] WSREP: Opened channel 'perconatest’120531 19:08:16 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3120531 19:08:16 [Note] WSREP: Waiting for SST to complete.120531 19:08:16 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 368de0a6-ab43-11e1-0800-d3c3b66b74d2120531 19:08:16 [Note] WSREP: STATE EXCHANGE: sent state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2120531 19:08:16 [Note] WSREP: STATE EXCHANGE: got state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2 from 0 (percona3)120531 19:08:16 [Note] WSREP: STATE EXCHANGE: got state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2 from 1 (percona2)120531 19:08:16 [Note] WSREP: STATE EXCHANGE: got state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2 from 2 (percona1)120531 19:08:16 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 12, members = 2/3 (joined/total), act_id = 0, last_appl. = -1, protocols = 0/4/1 (gcs/repl/appl), group UUID = 4c3288d4-aa70-11e1-0800-de2b28cfee00120531 19:08:16 [Note] WSREP: Flow-control interval: [14, 28]120531 19:08:16 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)120531 19:08:16 [Note] WSREP: State transfer required: Group state: 4c3288d4-aa70-11e1-0800-de2b28cfee00:0 Local state: 00000000-0000-0000-0000-000000000000:-1120531 19:08:16 [Note] WSREP: New cluster view: global state: 4c3288d4-aa70-11e1-0800-de2b28cfee00:0, view# 13: Primary, number of nodes: 3, my index: 0, protocol version 1120531 19:08:16 [Warning] WSREP: Gap in state sequence. Need state transfer.120531 19:08:18 [Note] WSREP: Running: 'wsrep_sst_xtrabackup ‘joiner’ ‘’ ‘’ ‘/var/lib/mysql/’ ‘/etc/mysql/my.cnf’ ‘9005’ 2>sst.err’120531 19:08:18 [Note] WSREP: Prepared SST request: xtrabackup| 19:08:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120531 19:08:18 [Note] WSREP: Assign initial position for certification: 0, protocol version: 2120531 19:08:18 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (4c3288d4-aa70-11e1-0800-de2b28cfee00): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.120531 19:08:18 [Note] WSREP: Node 0 (percona3) requested state transfer from ‘any’. Selected 1 (percona2)(SYNCED) as donor.120531 19:08:18 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)120531 19:08:18 [Note] WSREP: Requesting state transfer: success, donor: 1120531 19:08:23 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup ‘joiner’ ‘’ ‘’ ‘/var/lib/mysql/’ ‘/etc/mysql/my.cnf’ ‘9005’ 2>sst.err: 32 (Broken pipe)120531 19:08:23 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.120531 19:08:23 [ERROR] WSREP: SST failed: 32 (Broken pipe)120531 19:08:23 [ERROR] Aborting120531 19:08:23 [Warning] WSREP: 1 (percona2): State transfer to 0 (percona3) failed: -1 (Operation not permitted)120531 19:08:23 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive state. Need to abort.120531 19:08:23 [Note] WSREP: gcomm: terminating thread120531 19:08:23 [Note] WSREP: gcomm: joining thread120531 19:08:23 [Note] WSREP: gcomm: closing backend120531 19:08:23 [Note] WSREP: view(view_id(NON_PRIM,36412058-ab43-11e1-0800-85a41e0edd94,13) memb { 36412058-ab43-11e1-0800-85a41e0edd94,} joined {} left {} partitioned { 89f786ac-ab4d-11e1-0800-83b84a64d80b, 968b58d2-ab2d-11e1-0800-f2cbfe2d8ed2,})120531 19:08:23 [Note] WSREP: view((empty))120531 19:08:23 [Note] WSREP: gcomm: closed120531 19:08:23 [Note] WSREP: /usr/sbin/mysqld: Terminated.Abortado

there’s also a sst.err which contains

tar: Esto no parece un archivo tar (this doesn’t seem to be a tar file)tar: Exiting with failure status due to previous errorsError while getting st data from donor node

Any clues? Seems like a permission or file path configuration problem to me.

Please post the corresponding error log interval and sst.err from the donor node (percona2)

Hum the logs were lost in the void of stderr, but you pointed me on the right direction :slight_smile:

In the donor the sst.err said that can’t found a process to kill. In another run of the joining I was able to see

120601 11:43:24 [Note] WSREP: Running: 'wsrep_sst_xtrabackup ‘donor’ ‘******:4444/xtrabackup_sst’ ‘(null)’ ‘/var/lib/mysql/’ ‘/etc/mysql/my.cnf’ ‘4c3288d4-aa70-11e1-0800-de2b28cfee00’ ‘0’ ‘0’'120601 11:43:24 [Note] WSREP: sst_donor_thread signaled with 0innobackupex finished with error: 9. Check /var/lib/mysql//innobackup.backup.log

And in that file where the problem there was a

innobackupex: Error: mysql child process has died: ERROR 1045 (28000): Access denied for user ‘mysql’@‘localhost’ (using password: NO)

Creating this user with all privileges solved the issue.

But why? I couldn’t find any doc explaining why this is neccessary. And the other node on the cluster started and replicated without this user.