performance on xtrabackup process

baph0m3t · July 7, 2020, 10:32am

Hi again community.
I’m getting Disk I/O overload alerts from zabbix because in our xtrabackup process taking more time and we want to know if our settings are ok or exist some tuning we can do to avoid this issue.

Here is what we got
1 Percona server - 4 cpu, 6 gb ram disk in Raid5 SSD over SAN
1 Percona slave - 8 cpu, 10 gb ram disk in Raid5 SSD over SAN

our slave ram right now is
total used free shared buff/cache availableMem: 9866 5481 2604 392 1780 3831Swap: 3967 109 3858
- Our application run only versus percona master server.
- The slave is of course for replication and create backup.
percona slave configuration is
-------
# Percona Custom server configuration## Ansible managed
[client]socket=/mysql/mysql.sock
[mysqld]innodb_undo_directory=/mysql/innodb/ibundologsinnodb_log_files_in_group=5innodb_log_group_home_dir=/mysql/innodb/ibredologscollation-server=utf8mb4_unicode_520_ciinnodb_write_io_threads=8innodb_fast_shutdown=0character_set_server=utf8mb4max_connections=1500innodb_buffer_pool_size=3Ginnodb_flush_method=O_DIRECTlower_case_table_names=1max_allowed_packet=1Glog_error_verbosity=3slow_query_log_file=/mysql/log/slow_query_log.txtinnodb_data_home_dir=/mysql/innodb/ibdataslow-query-log=0explicit_defaults_for_timestamp=OFFinnodb_log_file_size=250Mlog_timestamps=SYSTEMinnodb_autoextend_increment=50sql_mode=ANSI,NO_AUTO_CREATE_USER,STRICT_TRANS_TABLESgeneral-log=0wait_timeout=300innodb_flush_log_at_trx_commit=1innodb_data_file_path=data1:100M:autoextendinnodb_buffer_pool_instances=8general-log-file=/mysql/log/general_query_log.txtinnodb_read_io_threads=8innodb_file_per_table=ON
# Percona 5x specific configinnodb_temp_data_file_path=…/ibgtemp/gtemp:50M:autoextend:max:500M

[mysqldump]max_allowed_packet=1G
[mysql]max_allowed_packet=1Gdefault-character-set=utf8mb4no-auto-rehashsocket=/mysql/mysql.sock-----
Our script process run in this way with this values. 80% for parallel, compress and crypt process using 7 cpu.

logs from script using xtrabackup

----

xtrabackup: recognized server arguments: --log_bin=/mysql/binlogs/binlog_node2 --server-id=2 --innodb_undo_directory=/mysql/innodb/ibundologs --innodb_log_files_in_group=5 --innodb_log_group_home_dir=/mysql/innodb/ibredologs --innodb_write_io_threads=8 --innodb_buffer_pool_size=3G --innodb_flush_method=O_DIRECT --innodb_data_home_dir=/mysql/innodb/ibdata --innodb_log_file_size=250M --innodb_autoextend_increment=50 --innodb_flush_log_at_trx_commit=1 --innodb_data_file_path=data1:100M:autoextend --innodb_read_io_threads=8 --innodb_file_per_table=1 --datadir=/mysql/data --parallel=4

xtrabackup: recognized client arguments: --socket=/mysql/mysql.sock --backup=1 --user=xbuser --password=* --stream=xbstream --lock-ddl=1 --slave-info=1 --compress --compress-threads=4 --encrypt=AES256 --encrypt-key=* --encrypt-threads=4

----

step 1- calculate LSN to get from - start because we do backup every 30 mins to get 1 full at midnight and every 30min the differences backups to recovery more actual data.

this action create Read disk over 40-60mb/s with IOWait 90-99% for 3 minutes.

step 2 - Do xtrabackup using xbstream from percona server slave to crypting & compressing and send to repository server over ssh.

Doing this steps we got with iotop

7 process getting from 2 ~ 20mb/s taking sometimes read disk 1,2 and 2mb/s, even sometime reach 80mb/s, we think is reading to crypt and compress.

So, with cpu 7 ~1.5mb/s reading this threads * 7 == 103 ~ 190 mb/s Disk Reading with IOwait from 5 ~ 50%, sometimes reach 15-70% IOwait

Step 3 - trasmission

xbstream send 500-600 kb/s with IOwait 6-13% (with low value 2%, and high 38% sometime) * 7 CPU got

(42-91%) de IOWait (depends on db in that moment)

in the repository we receive the process from xbstream crypting and compressing the xtrabackup files receiving with 500-600 kb/s from percona slave with IOwait 10-23%, some times reach 900Kb/s, 1,5mb/s and 2mb/s.

in that mysql server we got 37 database with 77gb in total

the repository server is on the same raid SSD volume on SAN to get speed.. but still getting this disk overload

I know is possible to get multiple IOps bottlenecks on disk/network/CPU/VM/vmware etc.. etc.. I'm trying to figure out what could be.. if need more info feel free to ask me .

Exist some other tool on percona to get analyze the xtrabackup process ? I tried to share more less values to can help to fix or tuning better.

yves.trudeau · July 24, 2020, 7:38am

Hi, that can be many things, obviously the storage is the bottleneck give the iowait you have. At less than 1MB/s compressed, that means a raw throughput of maybe 2MB/s, that is small. On top of my head:
- degraded Raid-5 array
- use of InnoDB transparent page compression with punch holeBut in you case, you perform incremental backups, those are normally not fast. Percona Server can track modified pages, that helps in some cases but if too many pages are modified, it may be faster to just scan (not use the feature). For details about the page tracking, see:
https://www.percona.com/doc/percona-server/5.7/management/changed_page_tracking.html
Instead of doing incremental backup every 30min, have you considered streaming the binlogs? This is “THE” way of getting true PITR with MySQL. Essentially you use the mysqlbinlog tool to connect to a remote slave and stream the binlog in real-time. I wrote a script to manage that (there are many others…):
https://github.com/y-trudeau/Mysql-tools/blob/master/backup/binlog-streamer.shIt should be called by cron regularly, it checks the mysqlbinlog process, restarts it if it died and compress binlog files that are complete.

Topic		Replies	Views
IOWait so high Percona XtraBackup	2	1187	July 30, 2020
using xtrabackup on production servers Percona XtraBackup	0	420	November 6, 2018
Percona Backup + zenoss Percona XtraBackup	0	356	November 26, 2019
xtrabackup errors Percona XtraBackup	4	3025	February 13, 2024
xtrabackup --prepare is VERY slow Percona XtraBackup	6	3315	May 9, 2021

performance on xtrabackup process

Related topics