I have a cluster of 3 nodes, I started receiving error messages WSREP has not yet prepared node for application use
.
After analyzing the problem, I noticed that yesterday server3 disconnected and works separately, I restarted mysql on server3, but server3 does not connect, instead I get an error
Job for mysql.service failed because the control process exited with error code.
See "systemctl status mysql.service" and "journalctl -xeu mysql.service" for details.
Logs:
2023-10-27T09:11:50.698873Z 0 [Warning] [MY-011068] [Server] The syntax 'log_slow_slave_statements' is deprecated and will be removed in a future release. Please use log_slow_replica_statements instead.2023-10-27T09:11:50.698890Z 0 [Warning] [MY-011068] [Server] The syntax 'wsrep_slave_threads' is deprecated and will be removed in a future release. Please use wsrep_applier_threads instead.
2023-10-27T09:11:50.699557Z 0 [Warning] [MY-010097] [Server] Insecure configuration for --secure-log-path: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path.
2023-10-27T09:11:50.699771Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.33-25.1) starting as process 2039913
2023-10-27T09:11:50.706421Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2023-10-27T09:11:50.706434Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2023-10-27T09:11:50.706439Z 0 [Note] [MY-000000] [WSREP] New joining cluster node configured to use specified SSL artifacts
2023-10-27T09:11:50.706455Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib/galera4/libgalera_smm.so initial position: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:16050461
2023-10-27T09:11:50.706468Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib/galera4/libgalera_smm.so'
2023-10-27T09:11:50.706763Z 0 [Note] [MY-000000] [Galera] wsrep_load(): Galera 4.15(6983fb2) by Codership Oy <info@codership.com> (modified by Percona <https://percona.com/>) loaded successfully.
2023-10-27T09:11:50.706774Z 0 [Note] [MY-000000] [Galera] CRC-32C: using 64-bit x86 acceleration.
2023-10-27T09:11:50.706876Z 0 [Warning] [MY-000000] [Galera] SSL compression is not effective. The option socket.ssl_compression is deprecated and will be removed in future releases.
2023-10-27T09:11:50.706880Z 0 [Warning] [MY-000000] [Galera] Parameter 'socket.ssl_compression' is deprecated and will be removed in future versions
2023-10-27T09:11:50.707926Z 0 [Note] [MY-000000] [Galera] Found saved state: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:16050461, safe_to_bootstrap: 0
2023-10-27T09:11:50.707961Z 0 [Note] [MY-000000] [Galera] GCache DEBUG: opened preamble:
Version: 2
UUID: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18
Seqno: 15857919 - 16050461
Offset: 127383736
Synced: 1
EncVersion: 1
Encrypted: 0
MasterKeyConst UUID: 4e02820b-56ee-11ee-b93e-e627301e8e8e
MasterKey UUID: 00000000-0000-0000-0000-000000000000
MasterKey ID: 0
2023-10-27T09:11:50.707965Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: version: 2, UUID: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18, offset: 127383736
2023-10-27T09:11:50.708005Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2023-10-27T09:11:50.735591Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2023-10-27T09:11:50.736317Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: found gapless sequence 15857919-16050461
2023-10-27T09:11:50.736331Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer unused buffers scan... 0.0% ( 0/134215904 bytes) complete.
2023-10-27T09:11:50.739229Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer unused buffers scan...100.0% (134215904/134215904 bytes) complete.
2023-10-27T09:11:50.739233Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: found 0/192543 locked buffers
2023-10-27T09:11:50.739236Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: free space: 1824/134217728
2023-10-27T09:11:50.745133Z 0 [Note] [MY-000000] [Galera] Passing config to GCS: allocator.disk_pages_encryption = no; allocator.encryption_cache_page_size = 32K; allocator.encryption_cache_size = 16777216; base_dir = /var/lib/mysql/; base_host = server3; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.encryption = no; gcache.encryption_cache_page_size = 32K; gcache.encryption_cache_size = 16777216; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_auto_evict_threshold = 0.75; gcs.fc_auto_evict_window = 0; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.wait_restored_prim_timeout = PT0S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; socket.ssl = YES; socket.ssl_ca = ca.pem; socket.ssl_cert = server-cert.pem; socket.ssl_cipher = ; socket.ssl_compression = YES; socket.ssl_key = server-key.pem; socket.ssl_reload = 1;
2023-10-27T09:11:50.751150Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2023-10-27T09:11:50.751200Z 0 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:16050461, protocol version: -1
2023-10-27T09:11:50.751238Z 0 [Note] [MY-000000] [WSREP] Starting replication
2023-10-27T09:11:50.751243Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0
2023-10-27T09:11:50.751249Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:16050461
2023-10-27T09:11:50.751278Z 0 [Note] [MY-000000] [Galera] protonet asio version 0
2023-10-27T09:11:50.752428Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums.
2023-10-27T09:11:50.752437Z 0 [Note] [MY-000000] [Galera] backend: asio
2023-10-27T09:11:50.752470Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0
2023-10-27T09:11:50.752508Z 0 [Note] [MY-000000] [Galera] Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2023-10-27T09:11:50.752511Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2023-10-27T09:11:50.752563Z 0 [Note] [MY-000000] [Galera] GMCast version 0
2023-10-27T09:11:50.752617Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
2023-10-27T09:11:50.752620Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
2023-10-27T09:11:50.752720Z 0 [Note] [MY-000000] [Galera] EVS version 1
2023-10-27T09:11:50.752742Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'pxc-cluster', peer 'server1:,server2:,server3:'
2023-10-27T09:11:50.755112Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address ssl://server3:4567
2023-10-27T09:11:53.753024Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2488949800 cwnd: 1 last_queued_since: 2489249931426227 last_delivered_since: 2489249931426227 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:11:53.753072Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2488949800 cwnd: 1 last_queued_since: 2489249931486279 last_delivered_since: 2489249931486279 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:11:53.753091Z 0 [Note] [MY-000000] [Galera] announce period timed out (pc.announce_timeout)
2023-10-27T09:11:53.753119Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
2023-10-27T09:11:53.753126Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
2023-10-27T09:11:53.753139Z 0 [Warning] [MY-000000] [Galera] no nodes coming from prim view, prim not possible
2023-10-27T09:11:53.753150Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,dd14e23e-9dee,1)
memb {
dd14e23e-9dee,0
}
joined {
}
left {
}
partitioned {
}
)
2023-10-27T09:11:54.253193Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50047S), skipping check
2023-10-27T09:11:58.253216Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2488954300 cwnd: 1 last_queued_since: 2489254431608847 last_delivered_since: 2489254431608847 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:01.753319Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2488957800 cwnd: 1 last_queued_since: 2489257931716138 last_delivered_since: 2489257931716138 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:05.253430Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2488961300 cwnd: 1 last_queued_since: 2489261431836525 last_delivered_since: 2489261431836525 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:08.253556Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2488964300 cwnd: 1 last_queued_since: 2489264431950578 last_delivered_since: 2489264431950578 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:11.753677Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2488967804 cwnd: 1 last_queued_since: 2489267932071021 last_delivered_since: 2489267932071021 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:15.253771Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2488971304 cwnd: 1 last_queued_since: 2489271432178402 last_delivered_since: 2489271432178402 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:18.253891Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server2:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2488974304 cwnd: 1 last_queued_since: 2489274432295063 last_delivered_since: 2489274432295063 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:21.754030Z 0 [Note] [MY-000000] [Galera] (dd14e23e-9dee, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://server1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2488977804 cwnd: 1 last_queued_since: 2489277932421631 last_delivered_since: 2489277932421631 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)
2023-10-27T09:12:23.756130Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2023-10-27T09:12:23.756148Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2023-10-27T09:12:23.756199Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.33-25/percona-xtradb-cluster-galera/gcomm/src/pc.cpp:connect():176
2023-10-27T09:12:23.756208Z 0 [ERROR] [MY-000000] [Galera] /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.33-25/percona-xtradb-cluster-galera/gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2023-10-27T09:12:24.756298Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-10-27T09:12:24.756315Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-10-27T09:12:24.756412Z 0 [ERROR] [MY-000000] [Galera] /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.33-25/percona-xtradb-cluster-galera/gcs/src/gcs.cpp:gcs_open():1876: Failed to open channel 'pxc-cluster' at 'gcomm://server1,server2,server3': -110 (Connection timed out)
2023-10-27T09:12:24.756420Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out
2023-10-27T09:12:24.756426Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://server1,server2,server3) failed to establish connection with cluster (reason: 7)
2023-10-27T09:12:24.756432Z 0 [ERROR] [MY-010119] [Server] Aborting
2023-10-27T09:12:24.756604Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.33-25.1) Percona XtraDB Cluster (GPL), Release rel25, Revision 0c56202, WSREP version 26.1.4.3.
2023-10-27T09:12:24.757019Z 0 [ERROR] [MY-010065] [Server] Failed to shutdown components infrastructure.
2023-10-27T09:12:24.757122Z 0 [Note] [MY-000000] [Galera] dtor state: CLOSED
2023-10-27T09:12:24.757132Z 0 [Note] [MY-000000] [Galera] MemPool(TrxHandleSlave): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2023-10-27T09:12:24.758384Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2023-10-27T09:12:24.759521Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2023-10-27T09:12:24.760696Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2023-10-27T09:12:24.760702Z 0 [Note] [MY-000000] [Galera] cert index usage at exit 0
2023-10-27T09:12:24.760706Z 0 [Note] [MY-000000] [Galera] cert trx map usage at exit 0
2023-10-27T09:12:24.760710Z 0 [Note] [MY-000000] [Galera] deps set usage at exit 0
2023-10-27T09:12:24.760715Z 0 [Note] [MY-000000] [Galera] avg deps dist 0
2023-10-27T09:12:24.760720Z 0 [Note] [MY-000000] [Galera] avg cert interval 0
2023-10-27T09:12:24.760724Z 0 [Note] [MY-000000] [Galera] cert index size 0
2023-10-27T09:12:24.760756Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2023-10-27T09:12:24.760789Z 0 [Note] [MY-000000] [Galera] wsdb trx map usage 0 conn query map usage 0
2023-10-27T09:12:24.760795Z 0 [Note] [MY-000000] [Galera] MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2023-10-27T09:12:24.760857Z 0 [Note] [MY-000000] [Galera] Shifting CLOSED -> DESTROYED (TO: 0)
2023-10-27T09:12:24.766762Z 0 [Note] [MY-000000] [Galera] Flushing memory map to disk...
I know it can help to stop all nodes and reinitialize again, but it is preferable not to stop the database.
Please help me understand how to attach the node back without reloading cluster
You already answered me before, but a month later we ran into it again
The server responds to ping, but not to telnet
Also, how can I analyze the problem in detail, why server3 disconnected from the cluster?