After systemctl restart mysql, the node does not connect to the cluster

We have a cluster of 3 nodes, everything worked successfully until I rebooted one node via systemctl restart nodes. Now this node [server1] cannot return back to the cluster.

 INFO: Skipping wsrep-recover for 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:80805 pair
 INFO: Assigning 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:80805 to wsrep_start_position
2023-09-26T11:22:52.139059Z 0 [Warning] [MY-011068] [Server] The syntax 'wsrep_slave_threads' is deprecated and will be removed in a future release. Please use wsrep_applier_threads instead.
2023-09-26T11:22:52.139583Z 0 [Warning] [MY-010097] [Server] Insecure configuration for --secure-log-path: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path.
2023-09-26T11:22:52.139794Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.33-25.1) starting as process 5400
2023-09-26T11:22:52.142821Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2023-09-26T11:22:52.142832Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2023-09-26T11:22:52.142836Z 0 [Note] [MY-000000] [WSREP] New joining cluster node configured to use specified SSL artifacts
2023-09-26T11:22:52.142848Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib/galera4/libgalera_smm.so initial position: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:80805
2023-09-26T11:22:52.142854Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib/galera4/libgalera_smm.so'
2023-09-26T11:22:52.143134Z 0 [Note] [MY-000000] [Galera] wsrep_load(): Galera 4.15(6983fb2) by Codership Oy <info@codership.com> (modified by Percona <https://percona.com/>) loaded successfully.
2023-09-26T11:22:52.143145Z 0 [Note] [MY-000000] [Galera] CRC-32C: using 64-bit x86 acceleration.
2023-09-26T11:22:52.143236Z 0 [ERROR] [MY-000000] [Galera] Unrecognized parameter '”socket.ssl_key'
2023-09-26T11:22:52.143251Z 0 [Warning] [MY-000000] [Galera] SSL compression is not effective. The option socket.ssl_compression is deprecated and will be removed in future releases.
2023-09-26T11:22:52.143254Z 0 [Warning] [MY-000000] [Galera] Parameter 'socket.ssl_compression' is deprecated and will be removed in future versions
2023-09-26T11:22:52.144297Z 0 [Note] [MY-000000] [Galera] Found saved state: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:80805, safe_to_bootstrap: 0
2023-09-26T11:22:52.144331Z 0 [Note] [MY-000000] [Galera] GCache DEBUG: opened preamble:
Version: 2
UUID: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18
Seqno: 80542 - 80805
Offset: 123892656
Synced: 1
EncVersion: 1
Encrypted: 0
MasterKeyConst UUID: 087dcd27-56ee-11ee-9ea1-76ffae3e39c7
MasterKey UUID: 00000000-0000-0000-0000-000000000000
MasterKey ID: 0
2023-09-26T11:22:52.144335Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: version: 2, UUID: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18, offset: 123892656
2023-09-26T11:22:52.144370Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2023-09-26T11:22:52.144697Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2023-09-26T11:22:52.144701Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: found gapless sequence 80542-80805
2023-09-26T11:22:52.144712Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer unused buffers scan...  0.0% (        0/133884640 bytes) complete.
2023-09-26T11:22:52.144721Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer unused buffers scan...100.0% (133884640/133884640 bytes) complete.
2023-09-26T11:22:52.144723Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: found 0/264 locked buffers
2023-09-26T11:22:52.144726Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: free space: 333088/134217728
2023-09-26T11:22:52.149668Z 0 [Note] [MY-000000] [Galera] Passing config to GCS: allocator.disk_pages_encryption = no; allocator.encryption_cache_page_size = 32K; allocator.encryption_cache_size = 16777216; base_dir = /var/lib/mysql/; base_host = server1; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.encryption = no; gcache.encryption_cache_page_size = 32K; gcache.encryption_cache_size = 16777216; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_auto_evict_threshold = 0.75; gcs.fc_auto_evict_window = 0; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.wait_restored_prim_timeout = PT0S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; socket.ssl = YES; socket.ssl_ca = ca.pem; socket.ssl_cert = server-cert.pem; socket.ssl_cipher = ; socket.ssl_compression = YES; socket.ssl_key = server-key.pem; socket.ssl_reload = 1;
2023-09-26T11:22:52.154425Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2023-09-26T11:22:52.154452Z 0 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:80805, protocol version: -1
2023-09-26T11:22:52.154477Z 0 [Note] [MY-000000] [WSREP] Starting replication
2023-09-26T11:22:52.154481Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0
2023-09-26T11:22:52.154485Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:80805
2023-09-26T11:22:52.154499Z 0 [Note] [MY-000000] [Galera] protonet asio version 0
2023-09-26T11:22:52.155525Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums.
2023-09-26T11:22:52.155533Z 0 [Note] [MY-000000] [Galera] backend: asio
2023-09-26T11:22:52.155560Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0
2023-09-26T11:22:52.155593Z 0 [Note] [MY-000000] [Galera] Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2023-09-26T11:22:52.155596Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2023-09-26T11:22:52.155645Z 0 [Note] [MY-000000] [Galera] GMCast version 0
2023-09-26T11:22:52.155697Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
2023-09-26T11:22:52.155699Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
2023-09-26T11:22:52.155794Z 0 [Note] [MY-000000] [Galera] EVS version 1
2023-09-26T11:22:52.155818Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'pxc-cluster', peer 'server1:,server2:,server3:'
2023-09-26T11:22:52.158273Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address ssl://server1:4567
2023-09-26T11:22:55.156099Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 107 rttvar: 41 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 7921334597403 last_delivered_since: 7921334597403 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:22:55.156158Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server3:4567 timed out, no messages seen in PT3S, socket stats: rtt: 213 rttvar: 82 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 7921334666462 last_delivered_since: 7921334666462 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:22:55.156183Z 0 [Note] [MY-000000] [Galera] announce period timed out (pc.announce_timeout)
2023-09-26T11:22:55.156213Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
2023-09-26T11:22:55.156220Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
2023-09-26T11:22:55.156233Z 0 [Warning] [MY-000000] [Galera] no nodes coming from prim view, prim not possible
2023-09-26T11:22:55.156241Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,08098c34-b34e,1)
memb {
        08098c34-b34e,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2023-09-26T11:22:55.656281Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50049S), skipping check
2023-09-26T11:22:59.656239Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 150 rttvar: 70 rto: 204000 lost: 0 last_data_recv: 3000 cwnd: 10 last_queued_since: 7925834740375 last_delivered_since: 7925834740375 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:03.156339Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server3:4567 timed out, no messages seen in PT3S, socket stats: rtt: 238 rttvar: 91 rto: 204000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 7929334840200 last_delivered_since: 7929334840200 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:06.656453Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 103 rttvar: 40 rto: 204000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 7932834954296 last_delivered_since: 7932834954296 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:10.156560Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server3:4567 timed out, no messages seen in PT3S, socket stats: rtt: 219 rttvar: 82 rto: 204000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 7936335061350 last_delivered_since: 7936335061350 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:13.656669Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 100 rttvar: 39 rto: 204000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 7939835170468 last_delivered_since: 7939835170468 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:17.156777Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server3:4567 timed out, no messages seen in PT3S, socket stats: rtt: 230 rttvar: 86 rto: 204000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 7943335278156 last_delivered_since: 7943335278156 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:20.656891Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 98 rttvar: 39 rto: 204000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 7946835392072 last_delivered_since: 7946835392072 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:24.156995Z 0 [Note] [MY-000000] [Galera] (08098c34-b34e, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server3:4567 timed out, no messages seen in PT3S, socket stats: rtt: 231 rttvar: 87 rto: 204000 lost: 0 last_data_recv: 3500 cwnd: 10 last_queued_since: 7950335496593 last_delivered_since: 7950335496593 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-09-26T11:23:25.158754Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2023-09-26T11:23:25.158771Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2023-09-26T11:23:25.158829Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
         at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.33-25/percona-xtradb-cluster-galera/gcomm/src/pc.cpp:connect():176
2023-09-26T11:23:25.158836Z 0 [ERROR] [MY-000000] [Galera] /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.33-25/percona-xtradb-cluster-galera/gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2023-09-26T11:23:26.158923Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-09-26T11:23:26.158940Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-09-26T11:23:26.159015Z 0 [ERROR] [MY-000000] [Galera] /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.33-25/percona-xtradb-cluster-galera/gcs/src/gcs.cpp:gcs_open():1876: Failed to open channel 'pxc-cluster' at 'gcomm://server1,server2,server3': -110 (Connection timed out)
2023-09-26T11:23:26.159021Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out
2023-09-26T11:23:26.159027Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://server1,server2,server3) failed to establish connection with cluster (reason: 7)
2023-09-26T11:23:26.159033Z 0 [ERROR] [MY-010119] [Server] Aborting
2023-09-26T11:23:26.159181Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.33-25.1)  Percona XtraDB Cluster (GPL), Release rel25, Revision 0c56202, WSREP version 26.1.4.3.
2023-09-26T11:23:26.159567Z 0 [ERROR] [MY-010065] [Server] Failed to shutdown components infrastructure.
2023-09-26T11:23:26.159670Z 0 [Note] [MY-000000] [Galera] dtor state: CLOSED
2023-09-26T11:23:26.159679Z 0 [Note] [MY-000000] [Galera] MemPool(TrxHandleSlave): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2023-09-26T11:23:26.160576Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2023-09-26T11:23:26.161446Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2023-09-26T11:23:26.162413Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2023-09-26T11:23:26.162420Z 0 [Note] [MY-000000] [Galera] cert index usage at exit 0
2023-09-26T11:23:26.162424Z 0 [Note] [MY-000000] [Galera] cert trx map usage at exit 0
2023-09-26T11:23:26.162428Z 0 [Note] [MY-000000] [Galera] deps set usage at exit 0
2023-09-26T11:23:26.162433Z 0 [Note] [MY-000000] [Galera] avg deps dist 0
2023-09-26T11:23:26.162437Z 0 [Note] [MY-000000] [Galera] avg cert interval 0
2023-09-26T11:23:26.162441Z 0 [Note] [MY-000000] [Galera] cert index size 0
2023-09-26T11:23:26.162471Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2023-09-26T11:23:26.162495Z 0 [Note] [MY-000000] [Galera] wsdb trx map usage 0 conn query map usage 0
2023-09-26T11:23:26.162500Z 0 [Note] [MY-000000] [Galera] MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2023-09-26T11:23:26.162560Z 0 [Note] [MY-000000] [Galera] Shifting CLOSED -> DESTROYED (TO: 0)
2023-09-26T11:23:26.168617Z 0 [Note] [MY-000000] [Galera] Flushing memory map to disk...

Config file:

# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid

max_allowed_packet = 128M

# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800

######## wsrep ###############
# Path to Galera library
wsrep_provider=/usr/lib/galera4/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://server1,server2,server3
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# Slave thread to use
wsrep_slave_threads=8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node IP address
wsrep_node_address=server1
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified,  then system hostname will be used
wsrep_node_name=pxc-cluster-node-1

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=ENFORCING

# SST method
wsrep_sst_method=xtrabackup-v2


[mysqld]
wsrep_provider_options="socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem"

[sst]
encrypt=4
ssl-key=server-key.pem
ssl-ca=ca.pem
ssl-cert=server-cert.pem

Also, each server has the same certificates and permissions

Via telnet, nodes are also available on the ports indicated in the logs

Please help to solve this

Addition:
the node stopped connecting after I added it to the configuration

slow_query_log=ON
log_output=FILE
long_query_time=0
log_slow_admin_statements=ON
log_slow_slave_statements=ON
log_slow_rate_limit=100
log_slow_rate_type='query'
slow_query_log_always_write_time=1
log_slow_verbosity='full'
slow_query_log_use_global_control='all'

turning everything back didn’t help

1 Like

The error logs above show network connection issues. On this failed node, can you ping server2 and can you also telnet server2 4567 Do those both work? If not, you have network/dns issues.

I have the same issue with systemctl restart, so I change systemd unit for MySQL and it fixes my issue.
My changes like this

mkdir /etc/systemd/system/mysql.service.d && \
cat <<EOF > /etc/systemd/system/mysql.service.d/overrides.conf
[Service]
Restart=on-failure
RestartSec=30s
EOF
systemctl daemon-reload

Hello. Thanks for your reply, everything was fine with the ports. A hard shutdown and reboot helped, now everything is working fine. But we did not understand what was the reason

I still have a question, when the server is rebooted, mysql does not start on its own. You need to wait a certain time and then you can manually start the service. Is it supposed to be like that? What is it connected with?

You need to enable the unit file: systemctl enable mysql Then mysql will start on reboot.