handshake with remote endpoint failed, but endpoint is not part of config (any more)

I set up a cluster of 3 database nodes using a percona 5.6 cluster. Everything works, all nodes are replicated. I configured SSL for state transfer. The cluster recently changed its topography, because we had to migrate 2 nodes to new machines. The cluster was completely restarted recently with configuration files that only contain the 3 current cluster nodes node3, node8, nodeA (don’t mind the naming). Earlier we also had node1 but this machine left the cluster and does not run any mysql any more.

Now the problem: In the log files of node3 and node8 I find every couple of seconds an error log that looks like this:


2015-04-20 11:06:24 22546 [ERROR] WSREP: handshake with remote endpoint ssl://<PUBLIC.node1.IP>:57953 failed: 1: 'End of file.' ( )

where <PUBLIC.node1.IP> is the address of the old node1, which is by no means any more part of the cluster. I looked through the my.cnf config files of all cluster nodes, but all traces of node1 are gone from it. How does it come the node3 and node8 try to contact the outdated node? Note, that this is after a complete bootstrapping of the cluster starting from nodeA, which does not show the strange error messages in the log.

I found an error description of a very similar error here: https://mariadb.com/kb/en/mariadb/st…-in-error-log/

There it was suggested, that this may be due to a broken network configuration where nodes could only ping from one node to the other, but not the reverse direction. But the given reason/solution does not fit in my case, since all nodes can ping each other fine. Also, it does not explain at all the fact, that my cluster nodes try to contact a node that is not part of the cluster any more.

Here is the relevant part of the congiguration file (from node8)

### PERCONA CLUSTER STUFF
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_address=gcomm://<PUBLIC.nodeA.IP>,<PUBLIC.node3.IP>,<PUBLIC.node 8. IP>
#wsrep_cluster_address=gcomm://
wsrep_node_address=<PUBLIC.node8.IP>
wsrep_slave_threads=8
wsrep_sst_method=xtrabackup-v2
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_cluster_name=betdata_cluster
wsrep_provider_options=gcache.size=3G;socket.ssl_cert=/etc/mysql/cert.pem;socket.ssl_key=/etc/mysql/key.pem
wsrep_sst_donor=node3,nodeA
wsrep_node_name=node8
wsrep_sst_auth=sstuser:blahblubb
wsrep_sst_receive_address=<PUBLIC.node8.IP>:14444
query_cache_size=0
query_cache_type=0

Any Ideas?

Note. This is a cross posting from [URL=“replication - percona cluster 5.6 - handshake with remote endpoint failed, but endpoint is not part of config - Database Administrators Stack Exchange”]http://dba.stackexchange.com/questio...ndpoint-is-not[/URL]

I feel a bit stupid, but the answer to that strange behavior was simple:

The old node1 cluster node was trying to connect to the cluster again after a reboot. I forgot to uninstall the mysql installation on that node and instead only stopped the mysql server. That was well as long the system did not restart mysql.

To fix the situation I removed the old config files on node1 and also uninstalled the percona cluster packages.

Everything works now as expected.