Hi,
We’re working with a client who has recently upgraded their environment (on our instruction) from PXC8.0.35 to PXC8.4.3. Servers are dedicated 8 core, 128GB RAM running RHEL8.10 and buffer pool on each node is set to 89G. Since the upgrade the 3 nodes have been prone to crashing (in some cases only minutes after startup) with the following error:
2025-04-22T04:39:28.933845-04:00 3527 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2025-04-22T08:39:28Z UTC - mysqld got signal 11 ;
Signal SIGSEGV (Address not mapped to object) at address 0x53400000010
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=a0116595f0b4e51107ec68982d9a53281129781e
Server Version: 8.4.3-3.1 Percona XtraDB Cluster (GPL), Release rel3, Revision cf742b4, WSREP version 26.1.4.3, wsrep_26.1.4.3Thread pointer: 0x7ef63802a980
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 7f0acc111a90 thread_stack 0x80000
#0 0x231e0cc <unknown>
#1 0x1307096 <unknown>
#2 0x1307492 <unknown>
#3 0x7f111aa2898f <unknown>
#4 0x1f76da6 <unknown>
#5 0x1f76e4d <unknown>
#6 0x1f840b9 <unknown>
#7 0x1f85b9a <unknown>
#8 0x1f13f75 <unknown>
#9 0x1f27bcd <unknown>
#10 0x1f31ea0 <unknown>
#11 0x1f345bb <unknown>
#12 0xe07170 <unknown>
#13 0x12a91d2 <unknown>
#14 0x1168ba6 <unknown>
#15 0x116bd2d <unknown>
#16 0x116c364 <unknown>
#17 0x1170a6e <unknown>
#18 0x117123e <unknown>
#19 0x12f59b7 <unknown>
#20 0x28013c8 <unknown>
#21 0x7f111aa1e1c9 <unknown>
#22 0x7f1118a9b8d2 <unknown>
#23 0xffffffffffffffff <unknown>Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7ef6383dcda0): COMMIT
Connection ID (thread ID): 3527
Status: NOT_KILLEDYou may download the Percona XtraDB Cluster operations manual by visiting
Percona XtraDB Cluster: Top MySQL Clustering Alternative. You may find information
in the manual which will help you identify the cause of the crash.
Log of wsrep recovery (–wsrep-recover):
INFO: WSREP: Running position recovery with --log_error=‘/var/lib/mysql//wsrep_recovery_verbose.P8CEIC’ --pid-file=‘/var/lib/mysql//REDACTED-recover.pid’
Here’s a few bits I can share from the my.cnf at this stage:
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 2058M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table = 1
innodb-buffer-pool-size = 89G
innodb_buffer_pool_instances = 9
innodb_autoinc_lock_mode = 2
innodb_numa_interleave = OFFwsrep_provider = /usr/lib64/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://REDACTED
wsrep_log_conflicts
wsrep_cluster_name = pxc-cluster
wsrep_node_name = pxc-node-REDACTED
pxc_strict_mode = ENFORCING
wsrep_sst_method = xtrabackup-v2
wsrep_applier_threads = 8
wsrep_sync_wait = 0
wsrep_provider_options = “gcache.size=4G”
wsrep_retry_autocommit = 10
I believe also one of the 3 nodes is acting as the Master to an async DR site elsewhere which may well be a factor.
Just wanted to see initially if anyone had encountered anything similar with PXC8.4.3?
thanks,
Neil