Not the answer you need?
Register and ask your own question!

Replicating into cluster is slow

Tom-slTom-sl EntrantCurrent User Role Contributor
edited November 11 in Percona XtraDB Cluster 5.x

I am migrating from a system running master-slave replication with a master and 5 slave servers. I have set up a cluster with 6 nodes and am replicating from the original master into the new cluster in preparation for switching over to the new cluster. The problem I have is the replication into the cluster seems to be slow and it is difficult to catch up when there are more than 4 nodes running. When I had the 5th node it gets behind when the server is added and then doesn't seem to catch up. Is there something that can be done to improve this performance so that I can actually contemplate adding the 6th node?

The master-slave system is running MySQL 5.7 and the new cluster is running XtraDB 5.7.

Answers

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    Hello @Tom-sl , I would advise against 6 nodes due to split-brain issues. Keep the number of nodes 'odd', either 3 or 5. Understand that the fastest transaction speed of PXC is limited by the slowest member of your cluster. If one of your nodes is in a different data center, all other nodes must wait for that node to acknowledge replication events.

    Make sure all PXC nodes are the same specification (CPU/RAM/DISK). Have you set up Percona Monitoring? You should do this so you can identify where the slowdown is happening.

    Are you using ROW-based replication? Do you have parallel replication enabled?

  • Tom-slTom-sl Entrant Current User Role Contributor
    edited November 17

    All the servers are in the same datacenter and are the same spec. I made a few config changes which seemed to improve things a bit, but when I try switching applications to read from the servers, the replication slows down again. PMM is set up, though I'm not an expert so am not 100% sure what to look for with this particular issue. It doesn't seem to indicate a problem with resources as everything looks fine and doesn't look like the servers are breaking a sweat. CPU usage increases when I make the applications read from the cluster, but no higher than on the current servers.

    The servers are virtual servers and spec is as follows:

    CPU 4 vCPUs

    RAM 15 GB

    System Disk 40 GB

    Data Disk 150 GB (data and logs are all on this disk)

    MySQL settings are:

    [mysqld]

    server-id=1

    datadir=/data/var/lib/mysql

    socket=/var/run/mysqld/mysqld.sock

    log-error=/data/var/log/mysql/mysqld.log

    pid-file=/var/run/mysqld/mysqld.pid

    log_bin= /data/var/log/mysql/mysql-bin.log

    log_slave_updates

    expire_logs_days=2


    bind-address  = 0.0.0.0

    sql-mode    = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'


    ssl_ca=/data/var/lib/mysql/ca.pem

    ssl_cert=/data/var/lib/mysql/server-cert.pem

    ssl_key=/data/var/lib/mysql/server-key.pem


    lower_case_table_names = 1

    max_connections    = 800

    skip-name-resolve

    innodb_file_per_table

    innodb_buffer_pool_size = 10G

    innodb_buffer_pool_instances = 16

    innodb_io_capacity_max=8000

    innodb_io_capacity=4000

    innodb_log_file_size=256M

    innodb_flush_log_at_trx_commit = 2

    sync_binlog = 0


    wsrep_slave_threads = 32


    slave-skip-errors = 1062,1047


    max_binlog_size  = 100M


    #PMM Logging

    log_output=file

    slow_query_log=ON

    long_query_time=0

    log_slow_rate_limit=100

    log_slow_rate_type=query

    log_slow_verbosity=full

    log_slow_admin_statements=ON

    log_slow_slave_statements=ON

    slow_query_log_always_write_time=1

    slow_query_log_use_global_control=all

    innodb_monitor_enable=all

    userstat=1


    pxc-encrypt-cluster-traffic=ON


    early-plugin-load=keyring_file.so

    keyring-file-data = /var/lib/mysql-keyring/keyring

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    @Tom-sl Your innodb_io_capacity is tuned too high. Please lower that to 400. Also, userstat=1 can CPU intensive, you should disable that too. You only have 4 vCPUs so wsrep_slave_threads should not be larger than 4. Additionally, 16 buffer pool instances for only 10GB of buffer pool is too many. I would reduce that down to 2 instances. Lastly, I would increase innodb_log_file_size to 1G You can make those changes and simply restart MySQL to take affect.

    Are all of your tables using the InnoDB engine?

    Check PMM and look for various things like Disk IO maxing out, or CPU saturation maxing a single core. Look at the replication graph, are you seeing large spikes in lag? You can send a few screenshots here.

    It's very difficult to diagnose an issue like this over forums, but I'll try to help out.

  • Tom-slTom-sl Entrant Current User Role Contributor
    edited November 17

    I made those changes but it seems to be slightly worse now. I will leave it running for a bit and then take some screenshots of PMM. Yes, all InnoDB tables.

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    Looking at the PMM graph for replication lag, you are clearly seeing lag go up and down?

  • Tom-slTom-sl Entrant Current User Role Contributor

    Yes, here is the graph:

    I have spent most of the day adding things reading from the cluster and then removing them again (the long line of nothing was me going out for lunch).

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    That graph is from the async master correct? It is receiving replication events from PXC? And you confirm you are running with ROW-based replication?

  • Tom-slTom-sl Entrant Current User Role Contributor

    This graph is from the PXC server that is connected as a slave from the current MySQL master - We are migrating from standard mysql master-slave to xtradb. The replication on the master is set to mixed format, the new cluster is set to row.

    Here are a few PMM screenshots from one of the pxc nodes.


  • matthewbmatthewb Senior [email protected] Percona Staff Role

    The replication on the master is set to mixed format

    This is probably your issue. You need to change the current master in your master/slave to use ROW. Mixed mode always uses STATEMENT unless MySQL determines the txn is non-deterministic then switches to ROW.

    Change this on your master and restart mysql.

    Also, on your PXC slave (the one receiving from master) set slave_parallel_workers = 4 and slave_parallel_type = LOGICAL_CLOCK

  • Tom-slTom-sl Entrant Current User Role Contributor

    I'm not making it replicate from a slave which is using row based logs (I'm unable to change the actual master at the moment). Would this be expected to produce the same result? Due to using up all the disk it stopped replicating for a while, and is taking a very long time to catch up, far longer than I expect.

    I cannot set the slave_parallel_type as it says this is unsupported.

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    @Tom-sl Are you on 5.7? This parameter is absolutely supported. https://dev.mysql.com/doc/refman/5.7/en/replication-options-replica.html#sysvar_slave_parallel_type

    ROW-based replication is far more efficient than STATEMENT/MIXED and should help with the lag.

  • Tom-slTom-sl Entrant Current User Role Contributor
    edited November 24

    That should have said "I'm making it replicate from a slave which is using row based logs" - not sure why I put the not in the there. I just was unable to get it to go over around 550 writes per second.

    It might be supported in MySQL 5.7, but XtraDB says it is unsupported:

    ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits enabling slave_preserve_commit_order as it conflicts with galera multi-master commit order semantics

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    Interesting limitation from PXC. I was unaware.

    Just so I can understand, is this the setup? M -- MIXED --> S --- ROW ---> PXC1

    And PXC has 3 nodes? And you are watching slave lag on PXC1? What happens if you stop PXC2 and PXC3? Does lag go down?

Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.