Changing gcache.size from default size crashes PXC 8.4

I can setup and run a 3 node PXC 8.4.3-3.1 fine, but when changing gcache.size from 128M to any larger size on the first node, and second node tries to join both the first node and joining node crashes. I’m also using PXC 8.0 and this hasn’t been a problem before.

Any hint appreciated :slight_smile:

2025-01-31T22:41:45.789846Z 0 [Note] [MY-000000] [Galera] Member 1.0 (service-perconax2) requested state transfer from '*any*'. Selected 0.0 (service-perconax1)(SYNCED) as donor.
2025-01-31T22:41:45.789890Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 26)
2025-01-31T22:41:45.789947Z 1 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0
2025-01-31T22:41:45.789986Z 1 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2025-01-31T22:41:45.790010Z 1 [Note] [MY-000000] [Galera] GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 048f8bef-e023-11ef-bd97-a2dd72c932da:26
2025-01-31T22:41:47.552822Z 0 [Note] [MY-000000] [Galera] (8bca584f-a66c, 'ssl://0.0.0.0:4567') turning message relay requesting off
2025-01-31T22:41:47.682839Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2025-01-31T22:41:47.701524Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2025-01-31T22:41:59.034073Z 0 [Note] [MY-000000] [WSREP-SST] 2025/01/31 22:41:59 socat[609] E SSL_read(): Connection reset by peer
2025-01-31T22:41:59.037413Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2025-01-31T22:41:59.037448Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node:  exit codes: 1 0 0
2025-01-31T22:41:59.037518Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1420
2025-01-31T22:41:59.037627Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2025-01-31T22:41:59.038458Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2025-01-31T22:41:59.055053Z 0 [Note] [MY-000000] [Galera] (8bca584f-a66c, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.0.0.4:4567
2025-01-31T22:41:59.780870Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.0.0.5' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '1' --mysqld-version '8.4.3-3.1'   '' : 32 (Broken pipe)
2025-01-31T22:41:59.780945Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2025-01-31T22:41:59.780966Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2025-01-31T22:41:59.781107Z 3 [Note] [MY-000000] [Galera] Processing SST received
2025-01-31T22:41:59.781285Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2025-01-31T22:41:59.781376Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2025-01-31T22:41:59.781400Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2025-01-31T22:41:59.781417Z 3 [Note] [MY-000000] [Galera] Closing send monitor...
2025-01-31T22:41:59.781443Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2025-01-31T22:41:59.781459Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2025-01-31T22:41:59.781486Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2025-01-31T22:41:59.781767Z 3 [Note] [MY-000000] [Galera] gcomm: closing backend
2025-01-31T22:42:00.055227Z 3 [Note] [MY-000000] [Galera] (8bca584f-a66c, 'ssl://0.0.0.0:4567') reconnecting to 6dc66708-a2a2 (ssl://10.0.0.4:4567), attempt 0
2025-01-31T22:42:00.056087Z 3 [Note] [MY-000000] [Galera] Failed to establish connection: Connection refused
2025-01-31T22:42:01.555996Z 3 [Note] [MY-000000] [Galera] Failed to establish connection: Connection refused
2025-01-31T22:42:03.055981Z 3 [Note] [MY-000000] [Galera] Failed to establish connection: Connection refused
2025-01-31T22:42:03.781956Z 3 [Note] [MY-000000] [Galera] declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2025-01-31T22:42:03.781997Z 3 [Note] [MY-000000] [Galera] evs::proto(8bca584f-a66c, LEAVING, view_id(REG,6dc66708-a2a2,2)) suspecting node: 6dc66708-a2a2
2025-01-31T22:42:03.782006Z 3 [Note] [MY-000000] [Galera] evs::proto(8bca584f-a66c, LEAVING, view_id(REG,6dc66708-a2a2,2)) suspected node without join message, declaring inactive
2025-01-31T22:42:03.782039Z 3 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,6dc66708-a2a2,2)
memb {
	8bca584f-a66c,0
	}
joined {
	}
left {
	}
partitioned {
	6dc66708-a2a2,0
	}
)
2025-01-31T22:42:03.782083Z 3 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2025-01-31T22:42:03.782109Z 3 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2025-01-31T22:42:03.782273Z 3 [Note] [MY-000000] [Galera] gcomm: closed
2025-01-31T22:42:03.782349Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2025-01-31T22:42:03.782433Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]
2025-01-31T22:42:03.782442Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.
2025-01-31T22:42:03.782447Z 0 [Note] [MY-000000] [Galera] Shifting JOINER -> OPEN (TO: 26)
2025-01-31T22:42:03.782454Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE.
2025-01-31T22:42:03.782465Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [0, 0]
2025-01-31T22:42:03.782470Z 0 [Note] [MY-000000] [Galera] Received SELF-LEAVE. Closing connection.
2025-01-31T22:42:03.782474Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: 26)
2025-01-31T22:42:03.782500Z 0 [Note] [MY-000000] [Galera] RECV thread exiting 0: Success
2025-01-31T22:42:03.782597Z 3 [Note] [MY-000000] [Galera] recv_thread() joined.
2025-01-31T22:42:03.782617Z 3 [Note] [MY-000000] [Galera] Closing send queue.
2025-01-31T22:42:03.782633Z 3 [Note] [MY-000000] [Galera] Closing receive queue.
2025-01-31T22:42:03.782673Z 3 [Note] [MY-000000] [Galera] mysqld: Terminated.
2025-01-31T22:42:03.782689Z 3 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2025-01-31T22:42:03.782704Z 3 [Note] [MY-000000] [WSREP] Terminating SST process
2025-01-31T22:42:03Z UTC - mysqld got signal 11 ;
Signal SIGSEGV (unknown siginfo_t::si_code) at address 0x0
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=39a81c9e8b64d9236c046ceaea2584a4975dba03
Server Version: 8.4.3-3.1 Percona XtraDB Cluster (GPL), Release rel3, Revision cf742b4, WSREP version 26.1.4.3, wsrep_26.1.4.3

Thread pointer: 0x7ff9a8001720
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff9c3dfa928 thread_stack 0x100000
 #0 0xe8bdc3 <unknown>
 #1 0x7ffa57ebe72f <unknown>
 #2 0x7ffa57ea88d8 <unknown>
 #3 0x7ffa4b2d1c33 <unknown>
 #4 0x7ffa4b3032bc <unknown>
 #5 0x7ffa4b30b472 <unknown>
 #6 0x1f2c968 <unknown>
 #7 0xecc43c <unknown>
 #8 0x7ffa57f09d21 <unknown>
 #9 0x7ffa57f8e033 <unknown>
 #10 0xffffffffffffffff <unknown>

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 3
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
root@matomo-db-2:/data/service.perconax# ./clean.sh
[+] Running 1/1
 ✔ Container service-perconax2  Removed                                                                            10.2s
root@matomo-db-2:/data/service.perconax# nano config/custom.cnf
root@matomo-db-2:/data/service.perconax# nano docker-compose.yml
root@matomo-db-2:/data/service.perconax# ./start.sh 2025-01-31T22:41:45.789846Z 0 [Note] [MY-000000] [Galera] Member 1.0 (service-perconax2) requested state transfer from '*any*'. Selected 0.0 (service-perconax1)(SYNCED) as donor.
2025-01-31T22:41:45.789890Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 26)
2025-01-31T22:41:45.789947Z 1 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0
2025-01-31T22:41:45.789986Z 1 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2025-01-31T22:41:45.790010Z 1 [Note] [MY-000000] [Galera] GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 048f8bef-e023-11ef-bd97-a2dd72c932da:26
2025-01-31T22:41:47.552822Z 0 [Note] [MY-000000] [Galera] (8bca584f-a66c, 'ssl://0.0.0.0:4567') turning message relay requesting off
2025-01-31T22:41:47.682839Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2025-01-31T22:41:47.701524Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2025-01-31T22:41:59.034073Z 0 [Note] [MY-000000] [WSREP-SST] 2025/01/31 22:41:59 socat[609] E SSL_read(): Connection reset by peer
2025-01-31T22:41:59.037413Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2025-01-31T22:41:59.037448Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node:  exit codes: 1 0 0
2025-01-31T22:41:59.037518Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1420
2025-01-31T22:41:59.037627Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2025-01-31T22:41:59.038458Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2025-01-31T22:41:59.055053Z 0 [Note] [MY-000000] [Galera] (8bca584f-a66c, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.0.0.4:4567
2025-01-31T22:41:59.780870Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.0.0.5' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '1' --mysqld-version '8.4.3-3.1'   '' : 32 (Broken pipe)
2025-01-31T22:41:59.780945Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2025-01-31T22:41:59.780966Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2025-01-31T22:41:59.781107Z 3 [Note] [MY-000000] [Galera] Processing SST received
2025-01-31T22:41:59.781285Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2025-01-31T22:41:59.781376Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2025-01-31T22:41:59.781400Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2025-01-31T22:41:59.781417Z 3 [Note] [MY-000000] [Galera] Closing send monitor...
2025-01-31T22:41:59.781443Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2025-01-31T22:41:59.781459Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2025-01-31T22:41:59.781486Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2025-01-31T22:41:59.781767Z 3 [Note] [MY-000000] [Galera] gcomm: closing backend
2025-01-31T22:42:00.055227Z 3 [Note] [MY-000000] [Galera] (8bca584f-a66c, 'ssl://0.0.0.0:4567') reconnecting to 6dc66708-a2a2 (ssl://10.0.0.4:4567), attempt 0
2025-01-31T22:42:00.056087Z 3 [Note] [MY-000000] [Galera] Failed to establish connection: Connection refused
2025-01-31T22:42:01.555996Z 3 [Note] [MY-000000] [Galera] Failed to establish connection: Connection refused
2025-01-31T22:42:03.055981Z 3 [Note] [MY-000000] [Galera] Failed to establish connection: Connection refused
2025-01-31T22:42:03.781956Z 3 [Note] [MY-000000] [Galera] declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)

Solved. When things don’t make sense try adding more memory :joy: PXC 8.4 peaks with about 2G extra memory (in my case), very shortly - before returning to normal, when doing the SST (when second is node joining). This crashes PXC because the container has a memory limit. Adjusting the memory of the container or lowering the value of innodb_buffer_pool_size (allocates memory) made it finish SST without problems.

Adjusting gcache.size to as little as 256M also crashes PXC (when a limit is present), so there’s no difference if the value of gcache.size is 256M or 2G.

PXC 8.0 doesn’t do this I think, or I’ve just been lucky with my other PXC.