New Node4 Joining Erorrs and 32 Broken pipes

First of all let me thank you for the great product you’ve creating each day for people like me and businesses.

I’ve got an Percona XDB Cluster with such scheme ( https://i.imgur.com/P1YQbDc.png ):

We have lost PXC3 and now trying to rebuild Cluster with new node PXC4 and have no success. PXC2 aka PXC7 also was estored some times ago but with no problem at all.

For now we have got some failures we don’t know how to fix:


[B]-wsrep-sst-joiner: ******************* FATAL ERROR **********************
-wsrep-sst-joiner: Error while getting data from donor node: exit codes: 143 143
-wsrep-sst-joiner: ******************************************************
-wsrep-sst-joiner: Cleanup after exit with status:32[/B]

and

[B]WSREP: 0.0 (PXC2): State transfer to 2.0 (PXC4) failed: -32 (Broken pipe)
WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():765: Will never receive state. Need to abort.[/B]

and

[B]polkitd[656]: Unregistered Authentication Agent for unix-process:13754:5709648

[/B]

and

[B]WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'px1.px1.px1.px1:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '3aaca81d-8d1e-11e5-bf0e-aaeab5da33b6:401105267': 32 (Broken pipe)
WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address 'px1.px1.px1.px1:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '3aaca81d-8d1e-11e5-bf0e-aaeab5da33b6:401105267'
WSREP: 0.0 (PXC2): State transfer to 2.0 (PXC4) failed: -32 (Broken pipe)[/B]

There is a logs for all active nodes in cluster below.

Please, help us to bring the PXC4 node up and resolve the problem we own).

Text limitation (10k chars vs 40k chars we have) not allowing me to share logs here.
https://justpaste.it/1d46l

From the said text it sounds like joiner made an exit before donor could donate. To find the exact cause we would need complete log files from donor and joiner. You can attached them as attachment.

Logs attachment

Pxc Nodes Logs.docx (11.2 KB)

Scheme attachment

Problem can be resolved with downgrade PXC4 (4th Node) to the same XtraDB version as a donor.

Is there any other way to make different XDB version to work together or futher compotability requirements?

Cluster can have different version of PXC nodes (mix and match of 5.6 and 5.7 nodes)

Though this is not recommended or best way to operate given following incompatibilities:

  • If 5.6 node act as donor to 5.7 node then MySQL recommends running mysql_upgrade (SST script doesn’t do it automatically).
  • If 5.7 node act as donor to 5.6 node then 5.7-data-directory may not be compatible with 5.6 binary.

Also, slowest node define the performance. 5.6 PXC nodes doesn’t have performance optimization so complete cluster performance may look bad.

I would strongly recommend moving all your nodes to 5.7.19.

So frustrating. Found SST failing between 5.7.22 (donor) and 5.7.21 (joiner). my sysadmin and network guys pulling hair out as we all thought it was blocked ports. Maybe this post will help someone. No clue in the log, just
Error while getting data from donor node: exit codes: 143 143 and even less on donor. wsrep_debug gave no further info either.

Hi there, are you needing ongoing help, or confirming that your issue is resolved? Only if there’s still an ongoing problem I’ll need to get you to open a new post, these older ones don’t tend to be revisited. Let me know?