Hi
we upgrade to mysql 8 and since then every time we backup the slave server we are having a lag wit the replication.
its was not occurred when we were on version 5.7.
all tables are innodb.
the server are not busy at all with small amount of IO.
thanks
Yuda
Hi @yehudaf1 , welcome to Percona Forum!
Could you please share the command you’re using?
1 Like
#full backup
2022-12-17T04:00:02.238295-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized client arguments: --user=root --password=* --no-server-version-check=1 --backup=1 --target-dir=/db_backup/backup_percona/latest/full --kill-long-queries-timeout=60 --kill-long-query-type=select
#incremental backup
2022-12-17T21:10:06.946681-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized client arguments: --user=root --password=* --no-server-version-check=1 --backup=1 --target-dir=/db_backup/backup_percona/latest/inc_2022-Dec-17-21-10-06 --incremental-basedir=/db_backup/backup_percona/latest/full --kill-long-queries-timeout=60 --kill-long-query-type=select
1 Like
Hi
any idea why it may happing ?
during the backup the replication have delay and there for our reporting application doesn’t have valid data.
thanks
yehuda
1 Like
Hi
I would like to bump this up, I have the same problem.
My command is :
xtrabackup --defaults-extra-file=/root/.my.cnf --backup --register-redo-log-consumer --parallel=5 --stream=xbstream
And here is a screenshot of the replication status :
Hi @julienarcin
I had a look at your command and noted this option --register-redo-log-consumer
. This is not something I recalled using and then I read that this was “recently” introduced.
–register-redo-log-consumer()¶
The --register-redo-log-consumer
parameter is disabled by default. When enabled, this parameter lets Percona XtraBackup register as a redo log consumer at the start of the backup. The server does not remove a redo log that Percona XtraBackup (the consumer) has not yet copied. The consumer reads the redo log and manually advances the log sequence number (LSN). The server blocks the writes during the process. Based on the redo log consumption, the server determines when it can purge the log.
What I read is “The server blocks the writes during the process.”. Without going in to much of a detail can you try the same without redologconsumer option?
Thanks,
K
1 Like
Hi,
we have same problem.
We are not using the command you referred above.
xtrabackup --defaults-file=my.cnf --backup --socket=/tmp/mysqld.sock --user=mysqld --stream=xbstream --extra-lsndir=/backup/MYBACKUP/config --slave-info --target-dir=/backup/MYBACKUP/database --read-buffer-size=400M --no-server-version-check --skip-strict
@Marcelo_Altmann @kedarpercona Pls help.
thank you
Xtrabackup uses Lock Tables For Backup wich will block the SQL thread in case it is processing a DDL or DML on non-transactional tables. I would indicate you to monitor what SQL thread is doing or blocking on during the backup.
@Marcelo_Altmann Thank you for responding!
I will monitor the SQL threads. Could you pls confirm the options I am using are good ?
“xtrabackup --defaults-file=my.cnf --backup --socket=/tmp/mysqld.sock --user=mysqld --stream=xbstream --extra-lsndir=/backup/MYBACKUP/config --slave-info --target-dir=/backup/MYBACKUP/database --read-buffer-size=400M --no-server-version-check --skip-strict”
Also, Is there anyway to ensure a consistent backup while not blocking DDL/DMLs ?
@Chanakya I would probably increase the number of parallel copy threads there, via --parallel=X
.
Also, Is there anyway to ensure a consistent backup while not blocking DDL/DMLs ?
Just to clarify, DML on InnoDB is NOT blocked. We have some research and WIP work to reduce the time the instance remains under LTFB/LIFB allowing DDL. At the current release, the answer to your question is no, we need this lightweight lock to ensure consistent backups.
Acknowledged on the parallel threads. Does this number has to be same for xbstream threads ?
Thank you for confirmation. What is LTFB/LIFB ?
for xbstream you will probably be capped at the network level. But this is something that you will have to experiment to find the sweet spot.
Thank you for confirmation. What is LTFB/LIFB ?
Those are the MDL locks taken to ensure a consistent backup.
Lock Tables For Backup (LTFB) - Percona Server Only. Lighter than LIFB
Lock Instance For Backup (LIFB) - MySQL
Thanks much for the answers Marcelo
Do we have any further update on this issue, we are still facing this issue in all 8.x versions
I wanted to weigh in here. I’ve found a few reports on this bug, and they all seem to dead-end at the same place. We’ve been chasing this for about 6 months within our enterprise. This came up on our largest database server. As far as we could tell other servers weren’t being impacted, but the large one would have replication set back for hours every night when backups ran. For the longest time I was rejecting the notion that backups were causing it, because they never had before Percona 8.x.
Anyway, eventually I started randomizing the time our backups ran to correlate the replication delay, and was able to prove that whenever our idle standby ran its backup… that was when replication fell behind. This was counterintuitive to me because that particular host isn’t doing anything EXCEPT replication and backups… so I was surprised it was suffering when the primary database host didn’t seem to have any issues at all.
I suspect maybe this is because we run row-based replication, so the load patterns on the standby are different than the load patterns on the master, but I can’t prove that and would be curious to know if others experiencing this problem are also used row-based replication.
In any event, we’ve found a workaround, though I’m not sure if it’s ideal. Still, I wanted to share it with others who may be in this same situation.
By adding the flags “–compress --compress-threads=2 --parallel=4” we no longer experience replication delay when the backup is running (I tried with higher parallelism thread counts, but they didn’t seem to matter in a substantial way. 2/4 seems pretty comfortable for any server and gets you 90% of the performance gains, at least for our workloads. YMMV). I can’t explain why doing inline compression prevents replication from falling behind, but for our purposes this is a sufficient workaround for now. Hopefully this might help others who feel stuck in this situation.