Xtrabackup hangs unexpectedly

I’m using Vitess in Kubernetes, and I’m trying to take a backup of one of my databases.

Vitess abstracts the backup process, but it essentially runs the following in a running container:

xtrabackup --defaults-file=/vt/vtdataroot/vt_2313663783/my.cnf --backup --socket=/vt/socket/mysql.sock --slave-info --user=vt_dba --target-dir=/vt/vtdataroot/vt_2313663783/tmp --extra-lsndir=/vt/vtdataroot/vt_2313663783/tmp --stream=xbstream --parallel=1

After tailing the logs for a while, I notice it just hangs. The backup was running from 07:07 to 08:24.

default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.615916       1 xtrabackupengine.go:173] Executing Backup at 2025-06-10 07:07:23.557566743 +0000 UTC m=+678.953075417 for keyspace/shard tenant01_partial/- on tablet zone1-2313663783, concurrency: 4, compress: true, incrementalFromPos: 
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.616964       1 xtrabackupengine.go:219] Detected MySQL flavor: MySQL56
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.616997       1 xtrabackupengine.go:222] backup file name: backup.xbstream.gz
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.617009       1 xtrabackupengine.go:230] Starting backup with 8 stripe(s)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620836       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-000
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620872       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-001
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620885       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-002
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620896       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-003
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620907       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-004
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620923       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-005
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620932       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-006
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.620941       1 xtrabackupengine.go:815] Opening backup stripe file backup.xbstream.gz-007
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.625805       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.626101       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.626193       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.626301       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.626463       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.627464       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.627602       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.627811       1 compression.go:259] Compressing backup using engine "pargzip"
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.642845       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:23.642709-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized server arguments: --datadir=/vt/vtdataroot/vt_2313663783/data --innodb_data_home_dir=/vt/vtdataroot/vt_2313663783/innodb/data --innodb_log_group_home_dir=/vt/vtdataroot/vt_2313663783/innodb/logs --log_bin=/vt/vtdataroot/vt_2313663783/bin-logs/vt-2313663783-bin --server-id=1040751242 --tmpdir=/vt/vtdataroot/vt_2313663783/tmp --innodb_buffer_pool_size=12884901888 
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.642917       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:23.642845-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized client arguments: --backup=1 --socket=/vt/socket/mysql.sock --slave-info=1 --user=vt_dba --target-dir=/vt/vtdataroot/vt_2313663783/tmp --extra-lsndir=/vt/vtdataroot/vt_2313663783/tmp --stream=xbstream --parallel=1 
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.642951       1 xtrabackupengine.go:416] xtrabackup stderr: xtrabackup version 8.0.35-32 based on MySQL server 8.0.35 Linux (x86_64) (revision id: c339924a)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.715812       1 xtrabackupengine.go:416] xtrabackup stderr: 250610 07:07:23  version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/vt/socket/mysql.sock' as 'vt_dba'  (using password: NO).
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.716798       1 xtrabackupengine.go:416] xtrabackup stderr: 250610 07:07:23  version_check Connected to MySQL server
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:23.717004       1 xtrabackupengine.go:416] xtrabackup stderr: 250610 07:07:23  version_check Executing a version check against the server...
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.751473       1 xtrabackupengine.go:416] xtrabackup stderr: 
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.751542       1 xtrabackupengine.go:416] xtrabackup stderr: # A software update is available:
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.751759       1 xtrabackupengine.go:416] xtrabackup stderr: 250610 07:07:24  version_check Done.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.758526       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:24.758458-00:00 0 [Note] [MY-011825] [Xtrabackup] Connecting to MySQL server host: localhost, user: vt_dba, password: not set, port: not set, socket: /vt/socket/mysql.sock
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.763124       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:24.763025-00:00 0 [Note] [MY-011825] [Xtrabackup] Using server version 8.0.30
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.767895       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:24.767810-00:00 0 [Note] [MY-011825] [Xtrabackup] Executing LOCK INSTANCE FOR BACKUP ...
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.774904       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:24.774774-00:00 0 [Note] [MY-011825] [Xtrabackup] uses posix_fadvise().
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.774985       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:24.774813-00:00 0 [Note] [MY-011825] [Xtrabackup] cd to /vt/vtdataroot/vt_2313663783/data
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:24.774999       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:24.774825-00:00 0 [Note] [MY-011825] [Xtrabackup] open files limit requested 0, set to 1048576
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.325106       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.325030-00:00 0 [Note] [MY-011825] [Xtrabackup] using the following InnoDB configuration:
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.325168       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.325061-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_data_home_dir = /vt/vtdataroot/vt_2313663783/innodb/data
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.325179       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.325066-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_data_file_path = ibdata1:12M:autoextend
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.325187       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.325091-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_log_group_home_dir = /vt/vtdataroot/vt_2313663783/innodb/logs
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.325199       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.325095-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_log_files_in_group = 2
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.325207       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.325103-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_log_file_size = 50331648
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.326967       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.326942-00:00 0 [Note] [MY-011825] [Xtrabackup] inititialize_service_handles suceeded
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.380232       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.380149-00:00 0 [Note] [MY-011825] [Xtrabackup] Connecting to MySQL server host: localhost, user: vt_dba, password: not set, port: not set, socket: /vt/socket/mysql.sock
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.383516       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.383467-00:00 0 [Note] [MY-011825] [Xtrabackup] Redo Log Archiving is not set up.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.484039       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.483976-00:00 0 [Note] [MY-011825] [Xtrabackup] Starting to parse redo log at lsn = 845849279797
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.485272       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.485208-00:00 0 [Note] [MY-012564] [InnoDB] Recovery parsing buffer extended to 4194304.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.486723       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.486648-00:00 0 [Note] [MY-012564] [InnoDB] Recovery parsing buffer extended to 8388608.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.555047       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.554991-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (845859832516)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.580800       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.580729-00:00 0 [Note] [MY-012953] [InnoDB] Disabling background ibuf IO read threads.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.781968       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.781883-00:00 0 [Note] [MY-011825] [Xtrabackup] Generating a list of tablespaces
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.782102       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.782017-00:00 0 [Note] [MY-012204] [InnoDB] Scanning './'
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.786904       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.786831-00:00 0 [Note] [MY-012204] [InnoDB] Scanning '/vt/vtdataroot/vt_2313663783/innodb/data/'
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.787144       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.787040-00:00 0 [Note] [MY-012208] [InnoDB] Completed space ID check of 2 files.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:25.791025       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:25.790933-00:00 0 [Warning] [MY-012091] [InnoDB] Allocated tablespace ID 1 for sys/sys_config, old maximum was 0
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:26.555493       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:26.555415-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (845859857740)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:27.555973       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:27.555892-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (845859881660)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:27.818977       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:27.818903-00:00 0 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:27.820374       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:27.820334-00:00 0 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:27.822360       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:27.822304-00:00 0 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:27.822710       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:27.822664-00:00 2 [Note] [MY-011825] [Xtrabackup] Streaming ./vt_tenant01_partial/client_vault_assets.ibd
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:28.556470       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:28.556386-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (845859908254)

It looks like it’s making progress for about an hour as I see new log entries, including entries like this:

default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 07:07:41.759728       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T07:07:41.759717-00:00 2 [Note] [MY-011825] [Xtrabackup] Done: Streaming ./vt_tenant01_partial/similar_products.ibd

But eventually, the log entries stop. The last log entries are:

default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:08.435026       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:08.434935-00:00 1 [Warning] [MY-011825] [Xtrabackup] Log block checksum mismatch (block no 0 at lsn 846358902784): expected 0, calculated checksum 3965168067 block epoch no: 0
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:08.435110       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:08.434962-00:00 1 [Warning] [MY-011825] [Xtrabackup] this is possible when the log block has not been fully written by the server, will retry later.
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:08.435187       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:08.435131-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846358902784)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:09.435527       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:09.435392-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846358904539)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:10.435710       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:10.435626-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846358905286)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:11.436343       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:11.436279-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846359465472)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:12.454434       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:12.454388-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846361495410)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:13.454788       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:13.454657-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846361496335)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:14.469174       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:14.469115-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846361515702)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:15.470594       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:15.470478-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846362255475)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:16.470934       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:16.470845-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846362272802)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:17.474612       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:17.474568-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846362289847)
default-vttablet-zone1-2313663783-c6a83947 vttablet I0610 08:24:18.479927       1 xtrabackupengine.go:416] xtrabackup stderr: 2025-06-10T08:24:18.479812-00:00 1 [Note] [MY-011825] [Xtrabackup] >> log scanned up to (846362336847)

It looks like the last thing it was backing up is a 55G table, so not one of our biggest. And I as far as I understand, because it’s based on binary logs, actual table performance shouldn’t matter (besides the fact that bigger table = generally longer to back up).

xtrabackup version 8.0.35-32 based on MySQL server 8.0.35 
/usr/sbin/mysqld  Ver 8.0.30 for Linux on x86_64 (MySQL Community Server - GPL)

If I ps aux in the container over a period of a few minutes, I can see the TIME and RSS haven’t changed, so it looks like the process hasn’t done anything at all.

I have re-tried a couple of time, and restarted the MySQL server, but it seems to hang after roughly an hour and change (I haven’t checked if it’s the same table). I realise there’s not much to do if a process is hung, but I’m wondering if there’s any settings I can tweak to ensure it stays alive?

I did notice that the IO queue is quite high during this time, but it always is when we do a backup or restore.

So I’ve just adjusted some of our metrics as it looks like we were calculating instance memory incorrectly (oops), so this might have been caused by memory exhaustion. Will report back if it succeeds this time as it has been running for a few hours this time :crossed_fingers:

You should also adjust --parallel 4 because your current setting of 1 disables all parallelism. You’ll get better performance, and faster backup times with parallel threads.

Thanks!

I actually just asked if anyone used that setting in the Vitess Slack, so it’s good to hear it directly from the horse’s mouth!