Running a 3-node cluster, and one node has been stuck trying to recover for several days.
The issue seems to be that, after detecting a crash and trying to recover, the process times out before completion, at which time the pod restarts the whole process over again:
2022-01-10T06:02:42.486736Z 0 [Note] mysqld (mysqld 5.7.26-29-log) starting as process 1 ...
2022-01-10T06:02:42.492015Z 0 [Note] InnoDB: PUNCH HOLE support available
2022-01-10T06:02:42.492035Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2022-01-10T06:02:42.492039Z 0 [Note] InnoDB: Uses event mutexes
2022-01-10T06:02:42.492042Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2022-01-10T06:02:42.492045Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2022-01-10T06:02:42.492047Z 0 [Note] InnoDB: Using Linux native AIO
2022-01-10T06:02:42.492437Z 0 [Note] InnoDB: Number of pools: 1
2022-01-10T06:02:42.492556Z 0 [Note] InnoDB: Using CPU crc32 instructions
2022-01-10T06:02:42.495415Z 0 [Note] InnoDB: Initializing buffer pool, total size = 36G, instances = 8, chunk size = 128M
2022-01-10T06:02:43.808145Z 0 [Note] InnoDB: Completed initialization of buffer pool
2022-01-10T06:02:44.263310Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2022-01-10T06:02:44.286881Z 0 [Note] InnoDB: Recovering partial pages from the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2022-01-10T06:02:44.303990Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2022-01-10T06:02:44.639242Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 31019052325599
2022-01-10T06:02:44.846025Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 31019057568256
2022-01-10T06:02:45.059919Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 31019062811136
2022-01-10T06:02:45.264177Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 31019068054016
...
2022-01-10T06:03:42.684408Z 0 [Note] InnoDB: Database was not shutdown normally!
2022-01-10T06:03:42.684425Z 0 [Note] InnoDB: Starting crash recovery.
2022-01-10T06:03:42.783023Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 31457280 bytes
2022-01-10T06:03:43.014656Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
stream closed
Sometimes the process doesn’t get as far as 63%, e.g it might get to 59%, but never further than about 65.
Is this likely to be kubernetes that is responsible for terminating the container, or something internal to percona?