Not the answer you need? Try our search, browse our categories, or register and ask your own question!

Donor replication queue overflow during SST

victorvictor EntrantActive Member Participant
We recently upgraded from XtraDB 5.6 to 5.7. After the upgrade, however, we noticed an issue when bringing up a node with SST: the donor node replication queue backs up during an SST. Looking at the process list on the donor node it slowly fills up with "" entries. Eventually the donor reaches max connections and things start to really fall apart. We did not notice this behavior with 5.6.

Here is a screenshot from 'PXC/Galera Graphs' on the donor node:

fetch?filedataid=1198&type=thumb

Here is a screenshot from 'PXC/Galera Graphs' on the second database node (not the donor):

fetch?filedataid=1199&type=thumb

Here is a screenshot from 'PXC/Galera Cluster Overview' :

fetch?filedataid=1200&type=thumb

Here are the contents of :
xtrabackup: recognized server arguments: --server-id=2 --datadir=/var/lib/mysql --log_bin=mysql-bin --innodb_flush_log_at_trx_commit=2 --innodb_file_per_table=1 --innodb_flush_method=O_DIRECT --innodb_log_fil
e_size=50G --innodb_log_files_in_group=2 --innodb_log_buffer_size=512M --innodb_buffer_pool_size=433182M --innodb_read_io_threads=32 --innodb_write_io_threads=32 --innodb_io_capacity=5000 --tmpdir=/dev/shm --
defaults_group=mysqld --parallel=4
xtrabackup: recognized client arguments: --server-id=2 --datadir=/var/lib/mysql --log_bin=mysql-bin --innodb_flush_log_at_trx_commit=2 --innodb_file_per_table=1 --innodb_flush_method=O_DIRECT --innodb_log_fil
e_size=50G --innodb_log_files_in_group=2 --innodb_log_buffer_size=512M --innodb_buffer_pool_size=433182M --innodb_read_io_threads=32 --innodb_write_io_threads=32 --innodb_io_capacity=5000 --tmpdir=/dev/shm --
defaults_group=mysqld --parallel=4 --target-dir=/backup --socket=/var/lib/mysql/mysql.sock --binlog-info=LOCKLESS --user=cluster_username --password=* --socket=/var/lib/mysql/mysql.sock --lock-ddl=1 --backup=
1 --galera-info=1 --binlog-info=ON --stream=xbstream --target-dir=/dev/shm/pxc_sst_BGX1/donor_xb_FzlZ
181026 10:04:33  version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'cluster_username'  (using password: YES).
181026 10:04:33  version_check Connected to MySQL server
181026 10:04:33  version_check Executing a version check against the server...
181026 10:04:33  version_check Done.
181026 10:04:33 Connecting to MySQL server host: localhost, user: cluster_username, password: set, port: not set, socket: /var/lib/mysql/mysql.sock
Using server version 5.7.23-23-57-log
181026 10:04:33 Executing LOCK TABLES FOR BACKUP...
xtrabackup version 2.4.12 based on MySQL server 5.7.19 Linux (x86_64) (revision id: 170eb8c)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: open files limit requested 0, set to 32011
xtrabackup: using the following InnoDB configuration:
xtrabackup:   innodb_data_home_dir = .
xtrabackup:   innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 2
xtrabackup:   innodb_log_file_size = 53687091200
xtrabackup: using O_DIRECT
InnoDB: Number of pools: 1
xtrabackup: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error: write to logfile failed
xtrabackup: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xtrabackup: Error: xtrabackup_copy_logfile() failed.

Here are the contents of :
[client]
socket=/var/lib/mysql/mysql.sock

[mysqld]
server-id=2
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin=mysql-bin
log_slave_updates
expire_logs_days=2
symbolic-links=0
user=mysql
sql_mode = ONLY_FULL_GROUP_BY,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
max_allowed_packet=512M
innodb_flush_log_at_trx_commit=2
max_connections=2000
table_open_cache=15000
query_cache_size=0
query_cache_type=0
innodb_file_per_table
innodb_file_format=barracuda
innodb_flush_method=O_DIRECT
innodb_log_file_size=50G
innodb_log_files_in_group=2
innodb_log_buffer_size=512M
innodb_buffer_pool_size=433182M
innodb_buffer_pool_instances=64
innodb_read_io_threads=32
innodb_write_io_threads=32
innodb_io_capacity = 5000
innodb_thread_concurrency=0
max_connect_errors=4294967295
max_heap_table_size=67108864
tmp_table_size=67108864
max_heap_table_size=67108864
key_buffer_size=7735M
tmpdir=/dev/shm
slow_query_log_file=/var/log/mysql-slow.log
log-queries-not-using-indexes
log_output=file
slow_query_log=ON
long_query_time=0
log_slow_rate_limit=100
log_slow_rate_type=query
log_slow_verbosity=full
log_slow_admin_statements=ON
log_slow_slave_statements=ON
slow_query_log_always_write_time=1
slow_query_log_use_global_control=all
innodb_monitor_enable=all
userstat=1
performance_schema=ON
skip_name_resolve

Here are the contents of :
[mysqld_safe]
pid-file = /var/run/mysqld/mysqld.pid
socket   = /var/lib/mysql/mysql.sock
nice     = 0

Here are the contents of :
[mysqld]
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.100.236,192.168.100.237,192.168.100.238
binlog_format=ROW
default_storage_engine=InnoDB
wsrep_slave_threads=128
wsrep_provider_options="gcs.fc_limit=640;"
innodb_autoinc_lock_mode=2
wsrep_node_address=192.168.100.236
wsrep_cluster_name=prod
wsrep_node_name=d1
pxc_strict_mode=ENFORCING
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth="cluster_username:password"

[sst]
rlimit=80m
inno-backup-opts="--binlog-info=LOCKLESS"

We are a bit stumped as to why the donor node was fine during SST in 5.6 but is having problems now in 5.7. Any ideas?

Comments

  • lorraine.pocklingtonlorraine.pocklington Percona Community Manager Reader Mentor
    Hello Victor, thanks for your question ... are you still struggling with this one?

    Could you let me have a few pieces of info and then I will see if I can get someone to take a look.
    • What OS are you running on (version etc).
    • Are there any error logs being written? (apart from the errors to the innobackup.backup.log)
    • What sub-versions of Percona XtraDB Cluster 5.6 and 5.7 were you upgrading from and to, please?
    Although these are different versions, I found a past post that seems to be of some relevance... don't act on it, though, unless you are sure that it matches your scenario. Perhaps just check it out first and report back? https://www.percona.com/forums/quest...abackup-2-2-10

    Also, please be aware that on the open source forum we're only really able to help with 'generic' type issues. If something's really peculiar to your install and environment, then it might need our professional engagement to resolve the issue.

    But let's see, first, where we get to with the generalist review first. THANKS!

    PS Apologies if I ask for info that's obvious from what you've sent, I'm kind of "mildly" tech, so might have overlooked something :)
  • victorvictor Entrant Active Member Participant
    • We are running CentOS 7.4
    • There are no errors in any logs that I can see. The errors in innobackup.backup.log are only there because we killed the SST process before it finished.
    • I'm not entirely sure what version of 5.6 we were on before (how can we check?). We upgraded to 5.7.23-23.
  • lorraine.pocklingtonlorraine.pocklington Percona Community Manager Reader Mentor
    Hello victor the team have taken a look and have a couple of things for you to feedback on if you could, please.

    The first of these is that the 'broken pipe' message would suggest a network issue. That wouldn't explain why it was working on 5.6 and not on 5.7 but it's a first query from here. That might be just as you say because you interrupted the process.

    The next line of enquiry is that Percona XtraBackup may have died on the receiver side. If you could maybe batch up all logs including the XtraBackup logs from the receiver node, that might shine a light on the issue.

    Could you check these things out please?
  • victorvictor Entrant Active Member Participant
    The 'Broken pipe' error is from when we manually killed the SST process.

    XtraBackup is fine on the receiver side as well I believe.

    The SST process works. After about 1.5 hours our 3rd database did come online -- it's just that the donor node could not do anything while it was being a donor.
  • przemekprzemek Percona Support Engineer Percona
    Hello Victor,

    Did you capture more info from the donor while it was stuck during SST? Particularly I am interested in show processlist and show engine innodb status\G
    Can you upload the error log from both donor and joiner covering the SST time?
  • victorvictor Entrant Active Member Participant
    Przemysław,

    Thank you for your response, here is the information you requested:

    Donor



    The majority of the processes are like this:
    Id: 23
             User: system user
             Host: 
               db: NULL
          Command: Sleep
             Time: 456
            State: wsrep: committing write set (1979129617)
             Info: NULL
        Rows_sent: 0
    Rows_examined: 0
    

    One of the more interesting ones is this:
    Id: 118
             User: system user
             Host: 
               db: database_main
          Command: Sleep
             Time: 457
            State: Waiting for backup lock
             Info: CREATE TABLE IF NOT EXISTS tmp_table_1541710320 (
            item_tid INT UNSIGNED NOT NULL DEFAULT '0',
            num INT UNSIGNED NOT NULL DEFAULT '0',
            PRIMARY KEY (item_id)
        ) ENGINE = InnoDB
        Rows_sent: 0
    Rows_examined: 0
    

    I've attached the full process list in the zip file as .



    Attached in zip file as .



    Attached in zip file as .

    Joiner



    Attached in zip file as .
  • victorvictor Entrant Active Member Participant
    This turned out to be a bug in 5.7. This issue is being dealt with here: https://jira.percona.com/browse/PXC-2365
  • lorraine.pocklingtonlorraine.pocklington Percona Community Manager Reader Mentor
    Thanks for that update Victor, appreciated.
  • tucj7tucj7 Entrant Active Member Contributor
    Have you had any success in solving this? We are experiencing exactly the same thing when creating a temp table during SST. It's impossible to get the nodes back up. Will have to change the application to prevent temp table creation while this is going on. The strange thing is we have not upgraded any of the packages, but this is happening now for the first time.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.