Percona XtraDB Cluster node crash - “Transport endpoint is not connected” and Galera handshake failures

Description:
One of our Percona XtraDB Cluster (PXC) nodes is repeatedly logging Galera-related warnings followed by a crash. The issue starts with multiple “Failed to accept: remote_endpoint: Transport endpoint is not connected” and “Handshake failed” messages (unsupported protocol, version too low, invalid encoding, internal error), and then MySQL terminates with a signal 6.

Error snippet:

[Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
[Galera] Handshake failed: unsupported protocol
terminate called after throwing an instance of 'boost::exception_detail::clone_impl...'
what(): remote_endpoint: Transport endpoint is not connected
mysqld got signal 6;

Environment:

  • Percona XtraDB Cluster 8.0.28-19.1

  • WSREP version: 26.4.3

  • OS: (insert your OS here, e.g., RHEL 8.8)

  • Crash occurs during Galera handshake / SST phase.

  • Recovery attempt fails with:

    WSREP: Failed to recover position
    
    

What I’ve tried so far:

  • Verified network connectivity between nodes

  • Checked Galera ports and SST donor configuration

  • No clear hardware issues observed

Request:
Looking for guidance on possible causes (e.g., protocol mismatch, Galera version incompatibility, or network/socket issues) and recommended diagnostic or recovery steps. Are there some patch which can be applied to overcome this issue?

1 Like

can you share the my.cnf from the nodes having the issue you have reported.

1 Like

Template my.cnf for PXC

Edit to your requirements.

[client]
socket=/u02/cdp_data/mysql.sock
#ssl-ca=/u01/certs/ca.pem
#ssl-cert=/u01/certs/client-cert.pem
#ssl-key=/u01/certs/client-key.pem

[mysqld]
server-id=3
port=3406
#mysqlx_port=34060
mysqlx=0
#datadir=/var/lib/mysql
datadir=/u02/cdp_data
socket=/u02/cdp_data/mysql.sock
log-error=/u01/logs/mysqld.log
##pid-file=/var/run/mysqld/mysqld.pid
pid-file=/u02/mysqld/mysqld.pid

log_timestamps=SYSTEM

default_storage_engine=InnoDB

Binary log expiration period is 604800 seconds, which equals 7 days

binlog_expire_logs_seconds=604800
innodb_buffer_pool_size=4294967296
table_open_cache=10000

key_buffer_size=67108864
max_connect_errors=100000
######encryptio######
early-plugin-load=keyring_file.so
keyring_file_data=/u01/certs/keyring

####api GW upgrade##########
default-authentication-plugin=mysql_native_password
log-bin-trust-function-creators=1
innodb_log_buffer_size=32M
innodb_log_file_size=80M
group_concat_max_len=512000

######async replication#####
log-bin=mysql-bin
gtid-mode=ON
enforce-gtid-consistency
log-slave-updates

##When Slave on-entry##
relay-log=relay-log-server
read-only=ON
slave_parallel_workers=0

############################security####################
log-raw=OFF
skip-grant-tables = FALSE
password_history=3
validate_password.length=14
#password_reuse_interval=120

################################################################

##########################TLS and certificates#########################
tls_version=TLSv1.2

require_secure_transport=ON

ssl-ca=/u01/certs/ca.pem
ssl-cert=/u01/certs/server-cert.pem
ssl-key=/u01/certs/server-key.pem

######## wsrep ###############

Path to Galera library

wsrep_provider=/usr/lib64/galera4/libgalera_smm.so

Cluster connection URL contains IPs of nodes

#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://x.x.x..83,x.x.x..84,x.x.x..85

#wsrep_provider_options=“gmcast.listen_addr=tcp://0.0.0.0:4585;socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem”
wsrep_provider_options=“gmcast.listen_addr=tcp://0.0.0.0:4585;base_port=4585;socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem;socket.ssl=ON”
#wsrep_provider_options=“socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem”

In order for Galera to work correctly binlog format should be ROW

binlog_format=ROW

Slave thread to use

wsrep_slave_threads=8

wsrep_log_conflicts

max_connections=10000

#pxc-encrypt-cluster-traffic=OFF

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode=2

Node IP address

wsrep_node_address=x.x.x..85

Cluster name

wsrep_cluster_name=bbyn-pxc-cluster

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=-vp03.corp.boubyan.com

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=ENFORCING

SST method

wsrep_sst_method=xtrabackup-v2

[sst]
encrypt=4
ssl-key=/u01/certs/server-key.pem
ssl-ca=/u01/certs/ca.pem
ssl-cert=/u01/certs/server-cert.pem

1 Like

what is there in the mysql error logs, can you post those?

1 Like

2025-10-29T07:32:17.434529+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T07:32:17.481454+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: version too low
2025-10-29T07:32:17.528788+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unexpected message
2025-10-29T07:32:17.559890+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T07:32:17.622843+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:32:17.686404+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:33:05.506397+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:34:09.543408+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: wrong version number
2025-10-29T07:36:14.642314+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:36:26.963099+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:36:32.103324+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: http request
2025-10-29T07:36:32.121144+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:36:51.247769+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:36:51.251772+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:37:55.568321+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: wrong version number
2025-10-29T07:38:10.729443+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:38:15.825461+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:39:37.585629+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:42:16.105458+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:42:16.132568+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unknown protocol
2025-10-29T07:42:16.148251+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unknown protocol
2025-10-29T07:42:16.154977+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: version too low
2025-10-29T07:42:16.163751+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: version too low
2025-10-29T07:42:16.179430+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T07:42:16.187997+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T07:42:16.194966+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T07:42:16.198705+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T07:42:17.258759+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: no shared cipher
2025-10-29T07:42:17.279426+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: bad extension
2025-10-29T07:42:17.286058+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: bad extension
2025-10-29T07:42:29.396093+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: no shared cipher
2025-10-29T07:42:34.517786+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: no shared cipher
2025-10-29T07:42:35.571738+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: peer did not return a certificate
2025-10-29T07:42:35.588439+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: no shared cipher
2025-10-29T07:42:35.597127+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: no shared cipher
2025-10-29T07:42:35.613518+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: no shared cipher
2025-10-29T07:47:55.642634+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643563+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643607+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643642+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643676+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643709+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643744+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643774+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643676+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643709+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643744+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643774+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643796+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643816+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.643836+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T07:47:55.663756+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T07:47:55.665344+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: version too low
2025-10-29T07:48:35.964532+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T08:06:39.705415+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: invalid encoding
2025-10-29T08:09:14.443065+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: unsupported protocol
2025-10-29T08:09:14.471302+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: internal error
2025-10-29T08:09:14.487326+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: internal error
2025-10-29T08:10:47.390420+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T08:10:47.390489+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
2025-10-29T08:10:47.390525+03:00 0 [Warning] [MY-000000] [Galera] Failed to accept: remote_endpoint: Transport endpoint is not connected
terminate called after throwing an instance of ‘boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorstd::system_error >’
what(): remote_endpoint: Transport endpoint is not connected
2025-10-29T08:10:47.390620+03:00 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
05:10:47 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Build ID: 5aaeb8aff2f9757ae471361dbf4fa4ba945f6104
Server Version: 8.0.28-19.1 Percona XtraDB Cluster (GPL), Release rel19, Revision f544540, WSREP version 26.4.3, wsrep_26.4.3

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 0 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x217bee1]
/usr/sbin/mysqld(print_fatal_signal(int)+0x323) [0x11a0993]
/usr/sbin/mysqld(handle_fatal_signal+0xc8) [0x11a0a68]
/lib64/libpthread.so.0(+0x12990) [0x7f60cca6a990]
/lib64/libc.so.6(gsignal+0x10f) [0x7f60cae0552f]
/lib64/libc.so.6(abort+0x127) [0x7f60cadd8e65]
/lib64/libstdc++.so.6(+0x9009b) [0x7f60cb7b809b]
/lib64/libstdc++.so.6(+0x9654c) [0x7f60cb7be54c]
/lib64/libstdc++.so.6(+0x965a7) [0x7f60cb7be5a7]
/lib64/libstdc++.so.6(+0x96808) [0x7f60cb7be808]
/usr/lib64/galera4/libgalera_smm.so(+0x906c9) [0x7f60bed686c9]
/usr/lib64/galera4/libgalera_smm.so(+0x90e1c) [0x7f60bed68e1c]
/usr/lib64/galera4/libgalera_smm.so(+0xabbd9) [0x7f60bed83bd9]
/usr/lib64/galera4/libgalera_smm.so(+0xa5362) [0x7f60bed7d362]
/usr/lib64/galera4/libgalera_smm.so(+0xacdbf) [0x7f60bed84dbf]
/usr/lib64/galera4/libgalera_smm.so(+0x8916d) [0x7f60bed6116d]
/usr/lib64/galera4/libgalera_smm.so(+0x1b251e) [0x7f60bee8a51e]
/usr/lib64/galera4/libgalera_smm.so(+0x1b2646) [0x7f60bee8a646]
/lib64/libpthread.so.0(+0x81ca) [0x7f60cca601ca]
/lib64/libc.so.6(clone+0x43) [0x7f60cadf08d3]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
Log of wsrep recovery (–wsrep-recover):
INFO: WSREP: Running position recovery with --log_error=‘/u02/cdp_data/wsrep_recovery_verbose.QakIlg’ --pid-file=‘/u02/cdp_data/x.x.x.corp.boubyan.com-recover.pid’
ERROR! WSREP: Failed to recover position:

“2025-10-29T07:42:35.613518+03:00 0 [Warning] [MY-000000] [Galera] Handshake failed: no shared cipher”

This happens when:

  1. Different OpenSSL or Galera versions exist between nodes,
    e.g. one node uses OpenSSL 3.x (TLS 1.3 only) and another still uses OpenSSL 1.1.x (TLS 1.2 default).
  2. Custom wsrep_provider_options explicitly restrict allowed ciphers, or
  3. OS-level crypto policies differ.

When that happens, Galera logs “no shared cipher,” then fails the TLS handshake, then mysqld crashes with signal 6.

Start by checking that the openssl version is the same on you Nodes.

openssl version

One thing you can try would be to Disable SSL for Galera temporarily (quick test). If you nodes start then you know there is a issue with your SSL on your nodes.

If your cluster doesn’t need encrypted replication inside a trusted network, you can disable Galera SSL to confirm that’s the root cause.

SSL Versions are identical to nodes. This error came when it was going through Security scanning.