Not the answer you need?
Register and ask your own question!

Joining new node: State transfer failed: -32 (Broken pipe)

sourjyasourjya EntrantCurrent User Role Beginner
I have a cluster with 2 nodes operating out of one site. I'm attempting to add a third node in a different site. All nodes are reachable from each other and manual entries with node IPs have been added to each node's /etc/hosts file for good measure.

When I attempt start xtradb on the new node, I get the following error message: State transfer to 2.0 (site-b-mysql-01) failed: -32 (Broken pipe) - after which the mysql launch aborts and node fails to join cluster. Partial log below (due to post character limit). Full log is attached.
2017-08-08T03:57:35.246411Z 0 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(00000000-0000-0000-0000-000000000000:-1)
2017-08-08T03:57:35.247116Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-08-08T03:57:35.247193Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-08-08T03:57:35.247208Z 0 [Note] WSREP: Starting replication
2017-08-08T03:57:35.247259Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-08-08T03:57:35.247703Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-08-08T03:57:35.247931Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2017-08-08T03:57:35.248181Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-08-08T03:57:35.248209Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-08-08T03:57:35.249548Z 0 [Note] WSREP: GMCast version 0
2017-08-08T03:57:35.250285Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-08-08T03:57:35.250320Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-08-08T03:57:35.251218Z 0 [Note] WSREP: EVS version 0
2017-08-08T03:57:35.251519Z 0 [Note] WSREP: gcomm: connecting to group 'stiu_mysql_cluster_a', peer '10.1.2.48:,10.1.2.49:,10.10.201.3:'
2017-08-08T03:57:35.254610Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') connection established to b63503ea tcp://10.10.201.3:4567
2017-08-08T03:57:35.254662Z 0 [Warning] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') address 'tcp://10.10.201.3:4567' points to own listening address, blacklisting
2017-08-08T03:57:35.291763Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') connection established to 5dbd24b5 tcp://10.1.2.48:4567
2017-08-08T03:57:35.292311Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-08-08T03:57:35.293081Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') connection established to 70236211 tcp://10.1.2.49:4567
2017-08-08T03:57:36.780672Z 0 [Note] WSREP: declaring 5dbd24b5 at tcp://10.1.2.48:4567 stable
2017-08-08T03:57:36.780768Z 0 [Note] WSREP: declaring 70236211 at tcp://10.1.2.49:4567 stable
2017-08-08T03:57:36.790072Z 0 [Note] WSREP: Node 5dbd24b5 state primary
2017-08-08T03:57:36.800489Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,5dbd24b5,5)
memb {
        5dbd24b5,0
        70236211,0
        b63503ea,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2017-08-08T03:57:36.800545Z 0 [Note] WSREP: Save the discovered primary-component to disk
2017-08-08T03:57:37.253915Z 0 [Note] WSREP: gcomm: connected
2017-08-08T03:57:37.254197Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-08-08T03:57:37.254450Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2017-08-08T03:57:37.254568Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2017-08-08T03:57:37.254707Z 0 [Note] WSREP: Waiting for SST/IST to complete.
2017-08-08T03:57:37.254863Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: ba338369-7bed-11e7-b3a4-c3ff8c880ac1
2017-08-08T03:57:37.254913Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: ba338369-7bed-11e7-b3a4-c3ff8c880ac1 from 0 (site-a-mysql-01)
2017-08-08T03:57:37.254939Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: ba338369-7bed-11e7-b3a4-c3ff8c880ac1 from 1 (site-a-mysql-02)
2017-08-08T03:57:37.264381Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: ba338369-7bed-11e7-b3a4-c3ff8c880ac1 from 2 (site-b-mysql-01)
2017-08-08T03:57:37.264430Z 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 4,
        members    = 2/3 (primary/total),
        act_id     = 374,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 3bcfe97a-7071-11e7-8acc-4f4c64a0dc77
2017-08-08T03:57:37.264456Z 0 [Note] WSREP: Flow-control interval: [173, 173]
2017-08-08T03:57:37.264481Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 374)
2017-08-08T03:57:37.264646Z 2 [Note] WSREP: State transfer required: 
        Group state: 3bcfe97a-7071-11e7-8acc-4f4c64a0dc77:374
        Local state: 00000000-0000-0000-0000-000000000000:-1
2017-08-08T03:57:37.264718Z 2 [Note] WSREP: New cluster view: global state: 3bcfe97a-7071-11e7-8acc-4f4c64a0dc77:374, view# 5: Primary, number of nodes: 3, my index: 2, protocol version 3
2017-08-08T03:57:37.264758Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-08-08T03:57:37.264934Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.10.201.3' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '4078'  '' )
2017-08-08T03:57:38.006127Z 2 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|10.10.201.3:4444/xtrabackup_sst//1
2017-08-08T03:57:38.006176Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-08T03:57:38.006224Z 2 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-08T03:57:38.006248Z 2 [Note] WSREP: Assign initial position for certification: 374, protocol version: 3
2017-08-08T03:57:38.006422Z 0 [Note] WSREP: Service thread queue flushed.
2017-08-08T03:57:38.006531Z 2 [Note] WSREP: Check if state gap can be serviced using IST
2017-08-08T03:57:38.006578Z 2 [Note] WSREP: Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: 3bcfe97a-7071-11e7-8acc-4f4c64a0dc77
2017-08-08T03:57:38.006716Z 2 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
2017-08-08T03:57:38.016285Z 0 [Note] WSREP: Member 2.0 (site-b-mysql-01) requested state transfer from '*any*'. Selected 0.0 (site-a-mysql-01)(SYNCED) as donor.
2017-08-08T03:57:38.016314Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 374)
2017-08-08T03:57:38.016406Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2017-08-08T03:57:38.016461Z 2 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(3bcfe97a-7071-11e7-8acc-4f4c64a0dc77:374)
2017-08-08T03:57:38.547212Z 0 [Warning] WSREP: 0.0 (site-a-mysql-01): State transfer to 2.0 (site-b-mysql-01) failed: -32 (Broken pipe)
2017-08-08T03:57:38.547255Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():765: Will never receive state. Need to abort.
2017-08-08T03:57:38.547295Z 0 [Note] WSREP: gcomm: terminating thread
2017-08-08T03:57:38.547363Z 0 [Note] WSREP: gcomm: joining thread
2017-08-08T03:57:38.547568Z 0 [Note] WSREP: gcomm: closing backend
2017-08-08T03:57:38.567738Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,5dbd24b5,5)
memb {
        b63503ea,0
        }
joined {
        }
left {
        }
partitioned {
        5dbd24b5,0
        70236211,0
        }
)
2017-08-08T03:57:38.567819Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') connection to peer b63503ea with addr tcp://10.10.201.3:4567 timed out, no messages seen in PT3S
2017-08-08T03:57:38.567966Z 0 [Note] WSREP: (b63503ea, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-08T03:57:38.568022Z 0 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2017-08-08T03:57:38.568266Z 0 [Note] WSREP: gcomm: closed
2017-08-08T03:57:38.568297Z 0 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Terminated
        2017-08-08T03:57:48.538521Z WSREP_SST: [ERROR] Removing /tmp/pxc_sst_vnXlFmmy/joiner_yaHbN7W4/xtrabackup_galera_info file due to signal
        2017-08-08T03:57:48.542962Z WSREP_SST: [ERROR] Removing  file due to signal
        2017-08-08T03:57:48.547503Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 
        2017-08-08T03:57:48.549724Z WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 143 143
        2017-08-08T03:57:48.551860Z WSREP_SST: [ERROR] ****************************************************** 
        2017-08-08T03:57:48.554339Z WSREP_SST: [ERROR] Cleanup after exit with status:32

I'm beat as to what is causing the "broken pipe". Any help will be greatly appreciated.

Thank you,
Sourjya

Comments

  • utdreamutdream Entrant Inactive User Role Beginner
    I encountered this issue as well, and eventually figured out (by looking at the mysql logs of the primary) that I had forgotten to open the necessary ports on the joiner. The primary needs those ports open to it so that it can send over the DB backup - namely to port 4444.

    Here are the commands for a centos 7 machine:
    firewall-cmd --permanent --zone=public --add-port=3306/tcp
    firewall-cmd --permanent --zone=public --add-port=4444/tcp 
    firewall-cmd --permanent --zone=public --add-port=4567/tcp 
    firewall-cmd --permanent --zone=public --add-port=4568/tcp 
    firewall-cmd --reload
    

    obviously, those won't work if you're using customized firewall zones (like for an internal network) so extrapolate however you need.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.