Replication Problem where Cluster locks up completely

Hello,
we have kind of a weird problem. But first, let me describe our setup. We have a cluster of 3 percona mysql nodes which sit behind 2 ProxySQL Servers. The MySQL Cluster is used by many Webservers. This normally does not cause any problems and works fine.

But lately we are having a problem that when a lot of data is being put in, the replication somehow locks up and the webservers won’t get their querys. We can then stop the two nodes which are not the ProxySQL Writer, then the database immediatly frees up and the problem is resolved. We get no errors in our log when this happens btw. As an additional infroamtion, SELECT Querys can still be resolved while the problem exists.

This is our config:

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

[mysqld]
sql-mode="NO_ENGINE_SUBSTITUTION"
pxc_strict_mode="PERMISSIVE"
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

# Option for better logging
performance_schema_session_connect_attrs_size = 1024

wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_name=pxc-nonprod-cluster
wsrep_cluster_address=gcomm://[CENSORED IP ADRESSES]

wsrep_node_name=vsdb3-non-prod-pxc
wsrep_node_address=[CENSORED IP ADRESS OF OWN HOST]

wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=[CENSORED SST SECRET]
wsrep_sst_donor="vsdb1-non-prod-pxc,vsdb2-non-prod-pxc"
wsrep_slave_threads = 128
wsrep_provider_options = "gcache.size=1G"
wsrep_retry_autocommit = 2

binlog_format=ROW
binlog_row_image = minimal
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

optimizer_search_depth=0

user=mysql
port                    = 3306
bind-address            = 0.0.0.0
socket                  = /var/run/mysqld/mysqld.sock
key_buffer_size         = 256M
max_allowed_packet      = 256M
thread_stack            = 256k
table_open_cache_instances = 16
sort_buffer_size        = 1M
read_buffer_size        = 128k
read_rnd_buffer_size    = 512k
myisam_sort_buffer_size = 16M
thread_cache_size       = 20
query_cache_size        = 0
query_cache_type        = 0

max_heap_table_size     = 128M
tmp_table_size          = 128M
skip-name-resolve

character-set-server    = utf8
collation-server        = utf8_unicode_ci

#Slow logging disabled
#slow_query_log = 1
#long_query_time = 0.1
#log_queries_not_using_indexes = 1

max_connections         = 50000
max_user_connections    = 5000

tmpdir          = /tmp/


innodb_write_io_threads=8
innodb_read_io_threads=8
innodb_buffer_pool_size = 64G
innodb_log_file_size = 1G
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
default-storage-engine = INNODB
innodb_file_format = barracuda
innodb_file_format_max = barracuda

secure-file-priv=/var/import/

ignore-db-dir=lost+found

[mysqldump]
quick
quote-names
max_allowed_packet = 16M

[mysql]
no-auto-rehash

[mysqlhotcopy]
interactive-timeout

[sst]
streamfmt= xbstream

[xtrabackup]
parallel = 8
rebuild-threads = 8

And also a Screenshot of mytop while the problem exists:

1 Like

What you are experiencing during high writes is probably flow control engaging.

You can add the following parameters to ‘wsrep_provider_options’ to try and mitigate flow control:

gcs.fc_limit=500;gcs.fc_factor=0.9

I also suggest you set up Percona Monitoring and Management as well to monitor flow control. There are several graphs specific to this.

1 Like

Thanks for the tip. We’ve looked into it and checked with "SHOW STATUS LIKE “wsrep%” and the “myq_status” tool if flow conrol was causing the problem.
The thing is, flow control doesn’t kick in when the problem starts but a lot later when the problem allready exists and gets worse for a few minutes. We haven’t changed any settings regarding flow control yet, because it doesn’t seem like it’s causing the problem.

We also found out in our testing that the replication stop is only with one query, so one query gets stuck and all the other queries after that get resolved. When we manually kill the query everything clears up instantly. But it’s not allways the same query and the are often really simple (e.g.: INSERT INTO ‘table’ VALUES (like 5 values or something)).
We also found out that the problem exists even when we don’t use the ProxySQL Server, but it happens less often.

About Replication, the way we try it is with a mysqldump piped into a mysql command both accessing the same instance.

1 Like

This leads me to believe you have a locking issue. Killing the query releases any locks which allows other queries to finish that were waiting on the locks. What’s the schema of this table? Always just a simple INSERT?

wsrep_slave_threads = 128

That is extremely high. I’ve never configured that above 32, on even the beefiest of servers.

Why are you in PERMISSIVE PXC mode?

1 Like

Hi, sorry for the late answer.
We did some tests and had to wait for devs.
We reduced the slave_threads to 32 like you suggested.

We are operating in pxc_strict_mode Permissive because the typo3 backends our customer uses were not compatible. Now that changed, but this is creating new issues. On a seperate sync process we use pt-table-sync to sync between two databases. But the tool now fails saying that it need to change binlog_format to a format that isn’t supported by pxc_strict_mode. To fix this we were thinking of setting the mode to permissive while syncing and setting it back to enforcing after the sync is done.
This would work, but when we set pxc_strict_mode in runtime, it only sets the node we are on, despite it being a global variable. Is this supposed to happen or is this a bug? Wouldn’t that allow the cluster to get into an inconsistent state?

We can’t access the backend hosts directly from the sync servers because of security reasons. We can only access the database through proxysql, but this only sets one random node.

1 Like

The pxc_strict_mode variable is specific to each node.

DISABLED: Do not perform strict mode validations and run as normal.
PERMISSIVE: If a validation fails, log a warning and continue running as normal.
ENFORCING: If validation fails during startup, halt the server and throw an error. If validation fails during runtime, deny the operation and throw an error.
MASTER: The same as ENFORCING except that the validation of explicit table locking is not performed. This mode can be used with clusters in which write operations are isolated to a single node.

If you have one of the node as an async master to an async replica then you should be ok with only setting the pxc node which is an async master variable pxc_strict_mode=permissive. that should work for you on solving the pt-table-sync thing.

production-db1 [RW] 
    \___ production-db2[RW] 
             \___ production-replica= [R]

You gotta run pt-table-checksum/sync from the async master which apparently is db2

I am assuming the error you were getting were something like

This tool requires binlog_format=STATEMENT, but the current binlog_format is set to ROW and an error occurred while attempting to change it.  If running MySQL 5.1.29 or newer, setting binlog_format requires the SUPER privilege.  You will need to manually set binlog_format to 'STATEMENT' before running this tool.

I would suggest getting access to that async master node and setting the

set global pxc_strict_mode='PERMISSIVE';

before the table-sync and set it back to its original value once the sync is done.

1 Like

Hi,
we don’t have a replica, we have a three node galera cluster where no node is a master or replica/slave.
The thing is, the server from where the sync is triggered and executed has no direct MySQL Access to the DB Cluster for security reasons. It all goes through a ProxySQL instance. Thus we can’t control on which node the pt-table-sync commands and the SET GLOBAL command will be executed (or if we can, we haven’t found a way). Is there any way to set the variable with one query on all nodes? Or is there a ProxySQL setting that may help?

1 Like

How about writing a procedure and calling that procedure before doing the sync… let’s say create a procedure1 having

set global pxc_strict_mode='PERMISSIVE';

and procedure2

set global pxc_strict_mode='OLD-value';

and give permission on execute on procudres to the proxysql user … so before and after executing the sync use these procedures.

1 Like

@Atlas pt-table-sync is designed to synchronize tables in async setups (source->replica). You are using PXC which is not async replication. I do not believe pt-t-s will work for you here. Also, I don’t see why you need it since PXC maintains sync state with all data at all times. If you have discovered an inconsistency on node2, then you should stop that node, erase the data and force an SST to get a correct copy of the data from another node. Or, go to the correct node, make a copy of the table, (CREATE TABLE … LIKE …) and insert the data from old to new (INSERT INTO … SELECT FROM …), drop the old and rename new to old. That will “sync” the data between all nodes.

2 Likes

Ah yes, to clarify this, we missuse pt-table-sync a bit. We use it not to synchronize replication, but to synchronize two different databases, like from a test environment to stage. We use this because mysqldump would produce a small downtime which is not acceptable for the customer. This is not perfect, but it works. But then I see why these two features don’t work well together.

1 Like

@Atlas I encourage you to open a feature request/bug report on https://jira.percona.com for the percona toolkit project to let pt-t-s ignore the binlog format. When sync’ing two servers that are not connected via replication, there is no binlog involved so you should be able to disable that check in pt-t-c but I don’t see it in the doc pages. Seems like a small oversight.

Why would mysqldump produce a downtime? If you just want to dump 1 table, and you use --single-transaction then there should be no downtime as both writers and readers will be able to run while the dump is in progress.

Have you looked at GitHub - mydumper/mydumper: Official MyDumper Project instead?

2 Likes

Thanks for the tip, I created a Job: [PT-2046] Option to ignore Binlog Format for pt-table-sync - Percona JIRA
With downtime I mean a downtime from the web application because it can’t access the tables while they are beeing imported, either because of locking or index rebuilding or something.

I now tried to set pxc_strict_mode with a Trigger:

CREATE TRIGGER update_pxc_strict_mode
  AFTER UPDATE
  ON pxc_strict_mode_table
  FOR EACH ROW
  BEGIN
    IF NEW.setting = "PERMISSIVE" THEN
      SET GLOBAL pxc_strict_mode = "PERMISSIVE";
    ELSEIF NEW.setting = "ENFORCING" THEN
      SET GLOBAL pxc_strict_mode = "ENFORCING";
    END IF;
  END;

But this still won’t work with the replication between the nodes. When I do “UPDATE pxc_strict_mode_table SET setting = ‘PERMISSIVE’;” the variable is set on the active system, so the trigger works, but it doesn’t work on the other nodes. The table is updated, but the system variable doesn’t change.

I also tried using a procedure like @mughees52 suggested:

CREATE PROCEDURE update_pxc_strict_mode_permissive()
  BEGIN
    SET GLOBAL pxc_strict_mode = "PERMISSIVE";
  END //

When executing “CALL update_pxc_strict_mode_permissive();” the same thing happens, the local system variable changes, but the variables on the other nodes stay the same.