PXC freeze on large DB restore

Hello,

I have 3 node in a PXC cluster 5.7.26-29-57-log + 2 proxysql + Keepalived. I works great but i’m experiencing an issue when I restore databases (>200Mo) through dump or through application (my application have the ability to restore a database).
I’m not using XtraDbBackup for now because my application as priority to manage backup and restore.

when restoring, at begening all works well, but after a moment the replication latency grow up and then I have all nodes OFFLINE_SOFT then OFFLINE_HARD. In this state I can’t do any operation, all server are down. I have to shutdown all and then bootstrap the good node (by checking grastate.dat).
But sometimes no node are “safe_to_bootstrap” and so I have to check gvwstate or GTID.

The only workaround I found for now is to shutdown 2 nodes when I want to restore a database then restart the 2 nodes and it’s OK.

So I can say it’s a configuration problem for sure, but don’t know what to adjust.

Thanks for your help.

David

Hello DavidD

Thanks for your question.

Could you upload the mysql.conf for each of the nodes, please, and also any error log files that are being written.
Also, details of the server environment.

Please obfuscate any information that would identify your servers etc though.
Thanks!

Lorraine

Hello Lorraine,

Thank you.

This is my configs files

node 1 :
my.cnf

[mysqld]
# general
server-id=1

datadir=/data
socket=/var/run/mysqld/mysqld.sock
pid-file=/var/run/mysqld/mysqld.pid

back_log=2000
connect_timeout=15
skip-name-resolve=1
ssl=0
table_definition_cache=2000
table_open_cache=10000
metadata_locks_hash_instances=256
max_connections=900
max_allowed_packet=1G
net_buffer_length=1048576

transaction-isolation=READ-COMMITTED

core-file
symbolic-links=0
bind-address = 0.0.0.0


character-set-server=utf8
collation-server=utf8_general_ci
# log
log-error=/var/log/mysqld.log
general-log-file= /datalog/mysql.log
pid-file=/var/run/mysqld/mysqld.pid
slow_query_log=ON
long_query_time=1
#slow_query_log_file=/datalog/mysql-slow.log
log_slow_rate_limit=100
log_slow_rate_type=query
log_slow_verbosity=full
log_slow_admin_statements=ON
log_slow_slave_statements=ON
slow_query_log_always_write_time=1

sql_mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

net_read_timeout=1440
net_write_timeout=2880

#pooling
#thread_handling=pool-of-threads
#thread_pool_size=36

# innodb
innodb_strict_mode=0
innodb_buffer_pool_size=8G
innodb_buffer_pool_instances=8
innodb_log_file_size=1G 
innodb_write_io_threads=8
innodb_read_io_threads=8

innodb_thread_concurrency=2
innodb_doublewrite=1
innodb_flush_log_at_trx_commit=1
innodb-page-cleaners=8
innodb_purge_threads=4
innodb_lru_scan_depth=2048
innodb_io_capacity=8000
innodb_io_capacity_max=16000
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_adaptive_hash_index=OFF
innodb-change-buffering=none
innodb_flush_neighbors=0
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10

# binlog
log-bin=/datalog/mysql-bin
log-bin-index=/datalog/mysql-bin.index
binlog-checksum=NONE
log_slave_updates
expire_logs_days=14
max_binlog_files=20
max_binlog_size=500M

log_output=file
binlog-format=ROW
relay-log=relay-1
enforce-gtid-consistency
gtid-mode=on
master-info-repository=TABLE
relay-log-info-repository=TABLE

# monitoring
performance_schema=ON
innodb_monitor_enable='%'
performance_schema_instrument='%synch%=on'
userstat=1


wsrep.cnf

[mysqld]
# Path to Galera library
wsrep_provider=/usr/lib/galera3/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://192.168.100.10,192.168.100.11,192.168.100.12

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# Slave thread to use
wsrep_slave_threads= 8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node IP address
wsrep_node_address=192.168.100.10
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=pxc1

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=DISABLED

# SST method
wsrep_sst_method=xtrabackup-v2

#Authentication for SST method
wsrep_sst_auth="sstuser:*********"


node 2:
my.cnf

[mysqld]
# general
server-id=1

datadir=/data
socket=/var/run/mysqld/mysqld.sock
pid-file=/var/run/mysqld/mysqld.pid

back_log=2000
connect_timeout=15
skip-name-resolve=1
ssl=0
table_definition_cache=2000
table_open_cache=10000
metadata_locks_hash_instances=256
max_connections=900
max_allowed_packet=1G
net_buffer_length=1048576

transaction-isolation=READ-COMMITTED

core-file
symbolic-links=0
bind-address = 0.0.0.0


character-set-server=utf8
collation-server=utf8_general_ci
# log
log-error=/var/log/mysqld.log
general-log-file= /datalog/mysql.log
pid-file=/var/run/mysqld/mysqld.pid
slow_query_log=ON
long_query_time=1
#slow_query_log_file=/datalog/mysql-slow.log
log_slow_rate_limit=100
log_slow_rate_type=query
log_slow_verbosity=full
log_slow_admin_statements=ON
log_slow_slave_statements=ON
slow_query_log_always_write_time=1

sql_mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

net_read_timeout=1440
net_write_timeout=2880

#pooling
#thread_handling=pool-of-threads
#thread_pool_size=36

# innodb
innodb_strict_mode=0
innodb_buffer_pool_size=8G
innodb_buffer_pool_instances=8
innodb_log_file_size=1G 
innodb_write_io_threads=8
innodb_read_io_threads=8

innodb_thread_concurrency=2
innodb_doublewrite=1
innodb_flush_log_at_trx_commit=1
innodb-page-cleaners=8
innodb_purge_threads=4
innodb_lru_scan_depth=2048
innodb_io_capacity=8000
innodb_io_capacity_max=16000
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_adaptive_hash_index=OFF
innodb-change-buffering=none
innodb_flush_neighbors=0
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10

# binlog
log-bin=/datalog/mysql-bin
log-bin-index=/datalog/mysql-bin.index
binlog-checksum=NONE
log_slave_updates
expire_logs_days=14
max_binlog_files=20
max_binlog_size=500M

log_output=file
binlog-format=ROW
relay-log=relay-1
enforce-gtid-consistency
gtid-mode=on
master-info-repository=TABLE
relay-log-info-repository=TABLE

# monitoring
performance_schema=ON
innodb_monitor_enable='%'
performance_schema_instrument='%synch%=on'
userstat=1


wsrep.cnf

[mysqld]
# Path to Galera library
wsrep_provider=/usr/lib/galera3/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://192.168.100.10,192.168.100.11,192.168.100.12

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# Slave thread to use
wsrep_slave_threads= 8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node IP address
wsrep_node_address=192.168.100.11
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=pxc2

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=DISABLED

# SST method
wsrep_sst_method=xtrabackup-v2

#Authentication for SST method
wsrep_sst_auth="sstuser:*********"


node 2:
my.cnf

[mysqld]
# general
server-id=1

datadir=/data
socket=/var/run/mysqld/mysqld.sock
pid-file=/var/run/mysqld/mysqld.pid

back_log=2000
connect_timeout=15
skip-name-resolve=1
ssl=0
table_definition_cache=2000
table_open_cache=10000
metadata_locks_hash_instances=256
max_connections=900
max_allowed_packet=1G
net_buffer_length=1048576

transaction-isolation=READ-COMMITTED

core-file
symbolic-links=0
bind-address = 0.0.0.0


character-set-server=utf8
collation-server=utf8_general_ci
# log
log-error=/var/log/mysqld.log
general-log-file= /datalog/mysql.log
pid-file=/var/run/mysqld/mysqld.pid
slow_query_log=ON
long_query_time=1
#slow_query_log_file=/datalog/mysql-slow.log
log_slow_rate_limit=100
log_slow_rate_type=query
log_slow_verbosity=full
log_slow_admin_statements=ON
log_slow_slave_statements=ON
slow_query_log_always_write_time=1

sql_mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

net_read_timeout=1440
net_write_timeout=2880

#pooling
#thread_handling=pool-of-threads
#thread_pool_size=36

# innodb
innodb_strict_mode=0
innodb_buffer_pool_size=8G
innodb_buffer_pool_instances=8
innodb_log_file_size=1G 
innodb_write_io_threads=8
innodb_read_io_threads=8

innodb_thread_concurrency=2
innodb_doublewrite=1
innodb_flush_log_at_trx_commit=1
innodb-page-cleaners=8
innodb_purge_threads=4
innodb_lru_scan_depth=2048
innodb_io_capacity=8000
innodb_io_capacity_max=16000
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_adaptive_hash_index=OFF
innodb-change-buffering=none
innodb_flush_neighbors=0
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10

# binlog
log-bin=/datalog/mysql-bin
log-bin-index=/datalog/mysql-bin.index
binlog-checksum=NONE
log_slave_updates
expire_logs_days=14
max_binlog_files=20
max_binlog_size=500M

log_output=file
binlog-format=ROW
relay-log=relay-1
enforce-gtid-consistency
gtid-mode=on
master-info-repository=TABLE
relay-log-info-repository=TABLE

# monitoring
performance_schema=ON
innodb_monitor_enable='%'
performance_schema_instrument='%synch%=on'
userstat=1


wsrep.cnf

[mysqld]
# Path to Galera library
wsrep_provider=/usr/lib/galera3/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm://192.168.100.10,192.168.100.11,192.168.100.12

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# Slave thread to use
wsrep_slave_threads= 8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node IP address
wsrep_node_address=192.168.100.12
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=pxc3

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=DISABLED

# SST method
wsrep_sst_method=xtrabackup-v2

#Authentication for SST method
wsrep_sst_auth="sstuser:*********"


Thanks

Are you not seeing any log files being written when the db freezes?

Hello David,

pxc_strict_mode=DISABLED - any particular reason for disabling it? You may be simply asking for trouble by allowing potentially dangerous commands.
innodb_io_capacity=8000 - how much write random IOPS can your disk do? Have you tested it? In general you should not spend all of the capacity on that setting.

I wonder what does it mean - can you at least login to those nodes and take the basic status outputs, like:

SHOW PROCESSLIST;
SHOW ENGINE INNODB STATUS\G
SHOW STATUS LIKE 'ws%';

Hello,

Sorry, I wasn’t free the days.
lorraine.pocklington → I have some customers on my DB and so I have to plan when to do a large restore db. So last time the crash happen was 2 month ago and I have log rotation on 15 days.
I will perform a crash test soon.
przemek → I host an application for which editor wants ‘pxc_strict_mode=DISABLED’ so not a personnal choice.

When cluster freeze I can’t perform any mysql operation on (even SELECT).
IO capacity has been benched and I take 30% less maximum value in RW mode with large samples, according with my engineer. I have SSD + NVMe.

the basics output on each nodes are in attached file

Regards

David

basics.txt (91.5 KB)

Hi,

I just try run a restore a full database 45Mo and I have the issue.

No error on mysql error log

I have 1 active master (from proxysql) and when I read SHOW ENGINE INNODB STATUS \G I have this in joinned file

David

engine status.txt (30 KB)

And then I have “WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction” in my application

David

You can increase wsrep_slave_threads to 32
and try to restore your database to one node (not proxysql) again!

add the flowing to make your io faster

innodb_flush_log_at_trx_commit=2
sync_binlog=0

Thanks steven Lin but I can’t change innodb_flush_log_at_trx_commi to 2 because of related risks (1 is ACID guaranteed while 2 can lose up to 1 second worth transactions – according to mysql documentation) and I have ~ 4k question/s, with 15k q/s peaks (~10% write, so 400 to 1500 write/s).
Write perfomance is not the problem I think.
This is my datacenter conf :
SAN Dell Unity 450F + 24 SSD + NVMe cache managed by ESX + 6 link 25Gb/s between ESXi + Xeon 6154
I see with my engineer and ihe benched SAN 200k IOPS on a database.

I’ll try increase slave_thread.

innodb_flush_log_at_trx_commit=0
can lose up to 1 second worth transactions when your node crash.

innodb_flush_log_at_trx_commit=2
the data will protected by your file system ,so in most situation your data will be safe.

Hi, inscreasing slave_thread to 32 do the job. No stall on restoration today. Great
I’ll can now go ahead and sysbench the PXC with some big workload.
Thx
David

Hi, I upgrade this post. We investigate more with our engineers and found that pmm-client and espacially mysqld_exporter (Prometheus) cause the bug : Sometimes the process take lot of memory causing oom process :


Nov. 28 09:46:38 pxc2 kernel: mysqld_exporter invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
nov. 28 09:46:38 pxc2 kernel: mysqld_exporter cpuset=/ mems_allowed=0
...
nov. 28 09:46:38 pxc2 kernel: Out of memory: Kill process 2718 (mysqld) score 946 or sacrifice child
nov. 28 09:46:38 pxc2 kernel: Killed process 2718 (mysqld) total-vm:19560856kB, anon-rss:9663284kB, file-rss:0kB, shmem-rss:0kB
nov. 28 09:46:38 pxc2 kernel: oom_reaper: reaped process 2718 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

At this moment I had 10GB on the server and 8GB allocated to innodb_buffer.
After I changed to have 12 GB on the server and 9GB on innodb_buffer, so remaining 3GB and I have no problem.

Is there a wait to control the amount of memory pmm use ?

David

OK, thank you for this update. Let me get someone in the PMM team to check in on this, as it’s crossing into their patch and they won’t by default be reviewing PXC threads.

Hello David,

could you please tell us what pmm version you are using? There were some issues with the old pmm version where consumption of memory is high.
ref: [URL]Percona Monitoring and Management
If you are using old version of pmm, I would suggest upgrade to the latest version.

Also as you mentioned you had 10GB on the server and 8GB allocated to innodb_buffer, here we should also consider addition buffers of mysql which need additional memory other than innodb_buffer. so the real memory usage for mysql will be innodb_buffer + additional buffer memory.
Here check this blog post [URL]https://www.percona.com/blog/2014/01/24/mysql-server-memory-usage-2/[/URL]

So overall we left with a very low amount of memory for other OS processes and pmm exporter processes which could lead to Out of memory issue.

Hello,
I’m using pmm-admin 1.17.2

I’ll check other buffer to see a much memory mysql uses and then adjust my parameters.

thanks