Cluster stall during flush logs or auto binlog rotation

Hi.

I’ve been using Percona XtraDB Cluster (currently 5.6.24-72.2 on debian 7.0), and for a while now I have been investigating big (60 seconds) regular stalls on the cluster. These stalls happen multiple times a day, and I thought I had one or many rogue scripts doings huge one-line updates that would create this, so only investigated this. But yesterday I noticed a trend: All the querie time I was investigating in the binlog files were always at the end of the file. And lo and behold, seemingly all my stalls happen exactly at the same time as when a binlog file reaches the 100 MB size they are configured at, are when logrotate does the flush-logs at 6:25 in the morning.

I’ve changed the binlog file size to 1GB as a test, and indeed I now have 10 times less stalls than before. I have also tried putting sync_binlog to 1, thinking maybe it was a flush issue, and while it seemed to have helped a lot yesterday evening when the binlog file reached 1 GB (no stall happened, or it was short enough to be visible in the slow log at 5s), I still got my usual 60s stall this morning at 6:25.

I’m now a bit at a loss of ideas, so I’m now asking you guys what you think could be happening. By moving to a big 1GB file I have at least made it so the stalls could happen way less often than before, but it’s still not a good thing when it happens. Since I’m not sure it’s something I did wrong, or if it’s a bug (and if it’s been fixed already are not, and it’s difficult for me to upgrade without any good reasons), I’d rather not open a pointless bug report :slight_smile:

Here is some (hopefully) relevant stuff from the my.cnf file:
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 2
max_binlog_size = 1G
sync_binlog = 1
binlog_row_image = minimal
log-slave-updates
gtid_mode = ON
gtid_deployment_step = ON
enforce_gtid_consistency = ON

wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_address=gcomm://(removed)
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_node_address=(removed)
wsrep_sst_method=xtrabackup-v2
wsrep_cluster_name=nexway_xtra_cluster
wsrep_sst_auth=(removed)“”
wsrep_auto_increment_control = OFF
wsrep_notify_cmd = /usr/local/bin/galeranotify.py
wsrep_slave_threads = 16
wsrep_provider_options=“gcache.size = 5G; gcs.fc_limit=500; gcs.fc_master_slave=YES; gcs.fc_factor=1.0”

Thanks in advance for your help.

You never received a response to this but I am seeing the same exact issue on Percona 5.7.10-3. Did you ever find anything that fixed this?