Description:
Backup seem to be randomly failing
Steps to Reproduce:
Install operator with S3 backup storage, example backup section from the Helm chart.
# https://docs.percona.com/percona-operator-for-mysql/pxc/backups-storage.html
backup:
enabled: true
# image:
# tag: 1.15.0-pxc8.0-backup-pxb8.0.35-debug
pitr:
# PITR backups don't work yet, incorrect behaviour, so disabled for now
enabled: false
storageName: binlogs
# time in seconds between uploads
timeBetweenUploads: 300
storages:
# the normal backups
sql:
type: s3
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/redacted-s3-backup-role
s3:
bucket: redacted/sql/voicecore/
credentialsSecret: redacted
region: redacted
# the pitr binlogs for quick restore/replay
binlogs:
type: s3
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/redacted-s3-backup-role
s3:
bucket: redacted/sql/voicecore-binlogs/
credentialsSecret: redacted
region: redacted
schedule:
- name: every-six-hours
schedule: "0 */6 * * *"
keep: 360
storageName: sql
Version:
Chart: pxc-operator-1.15.0
Chart: pxc-db-1.15.0
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.28.13-eks-a737599
WARNING: version difference between client (1.31) and server (1.28) exceeds the supported minor version skew of +/-1
Logs:
[If applicable, include any relevant log files or error messages]
Finishing logs
2024-10-09 12:01:43.769 INFO: PC protocol downgrade 1 -> 0
2024-10-09 12:01:43.769 INFO: Current view of cluster as seen by this node
view ((empty))
2024-10-09 12:01:43.769 INFO: gcomm: closed
2024-10-09 12:01:43.769 INFO: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2024-10-09 12:01:43.769 INFO: Flow-control interval: [9999999, 9999999]
2024-10-09 12:01:43.769 INFO: Received NON-PRIMARY.
2024-10-09 12:01:43.769 INFO: Shifting JOINER -> OPEN (TO: 2847)
2024-10-09 12:01:43.769 INFO: New SELF-LEAVE.
2024-10-09 12:01:43.769 INFO: Flow-control interval: [9999999, 9999999]
2024-10-09 12:01:43.769 INFO: Received SELF-LEAVE. Closing connection.
2024-10-09 12:01:43.769 INFO: Shifting OPEN -> CLOSED (TO: 2847)
2024-10-09 12:01:43.769 INFO: RECV thread exiting 0: Success
2024-10-09 12:01:43.769 INFO: SST script already terminated
2024-10-09 12:01:43.769 INFO: recv_thread() joined.
2024-10-09 12:01:43.769 INFO: Closing replication queue.
2024-10-09 12:01:43.769 INFO: Closing slave action queue.
2024-10-09 12:01:43.769 INFO: Exiting main loop
+ EXID_CODE=4
+ '[' -f /tmp/backup-is-completed ']'
+ log ERROR 'Backup was finished unsuccessfull'
Terminating processProcess completed with error: /usr/bin/run_backup.sh: 4 (Interrupted system call)2024-10-09 12:01:43 [ERROR] Backup was finished unsuccessfull
+ exit 4
Expected Result:
Every backup job is completed on first run.
Actual Result:
The outputs of the jobs
xb-cron-mysql-pxc-db-sql-20241076034-2jo18-bxjrb 0/1 Completed 0 14d
xb-cron-mysql-pxc-db-sql-20241080034-2jo18-7xnbv 0/1 Completed 0 13d
xb-cron-mysql-pxc-db-sql-202410812034-2jo18-gdx4b 0/1 Error 0 13d
xb-cron-mysql-pxc-db-sql-202410812034-2jo18-hcnpx 0/1 Error 0 13d
xb-cron-mysql-pxc-db-sql-202410812034-2jo18-k96zb 0/1 Error 0 13d
xb-cron-mysql-pxc-db-sql-202410812034-2jo18-w5q87 0/1 Error 0 13d
xb-cron-mysql-pxc-db-sql-202410812034-2jo18-zlnnr 0/1 Completed 0 13d
xb-cron-mysql-pxc-db-sql-202410818034-2jo18-c26p6 0/1 Completed 0 13d
xb-cron-mysql-pxc-db-sql-20241086034-2jo18-wkm9t 0/1 Completed 0 13d
xb-cron-mysql-pxc-db-sql-20241090034-2jo18-rjhrl 0/1 Completed 0 12d
xb-cron-mysql-pxc-db-sql-202410912034-2jo18-fkxzx 0/1 Error 0 12d
xb-cron-mysql-pxc-db-sql-202410912034-2jo18-rl8b7 0/1 Error 0 12d
xb-cron-mysql-pxc-db-sql-202410912034-2jo18-t29dn 0/1 Completed 0 12d
xb-cron-mysql-pxc-db-sql-202410918034-2jo18-m8csz 0/1 Completed 0 12d
Additional Information:
What additional information would be needed to figure out what is going on here.