Description:
I am running a percona mysql server cluster (NOT xtradb) in version 0.8.0 with mysql-server 8.0.35. When updating to operator 0.9.0 all is still running smooth. When trying to upgrade the server to any version different (higher^^) than 8.0.35 the creation of the new pod is failing with messages like this:
2025-05-28T06:46:30.775676Z 48 [ERROR] [MY-010557] [Repl] Error reading packet from server for channel 'group_replication_recovery': Cannot replicate because the source purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new replica from backup. Consider increasing the source's binary log expiration period. The GTID sets and the missing purged transactions are too long to print in this message. For more information, please see the source's error log or the manual for GTID_SUBTRACT (server_errno=1236)
2025-05-28T06:46:30.775685Z 48 [ERROR] [MY-013114] [Repl] Replica I/O for channel 'group_replication_recovery': Got fatal error 1236 from source when reading data from binary log: 'Cannot replicate because the source purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new replica from backup. Consider increasing the source's binary log expiration period. The GTID sets and the missing purged transactions are too long to print in this message. For more information, please see the source's error log or the manual for GTID_SUBTRACT', Error_code: MY-013114
2025-05-28T06:46:30.775690Z 48 [Note] [MY-010570] [Repl] Replica I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2025-05-28T06:46:30.775708Z 41 [Note] [MY-011585] [Repl] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2025-05-28T06:46:30.775737Z 49 [Note] [MY-010596] [Repl] Error reading relay log event for channel 'group_replication_recovery': replica SQL thread was killed
2025-05-28T06:46:30.775818Z 49 [Note] [MY-010587] [Repl] Replica SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2025-05-28T06:46:30.776351Z 41 [Note] [MY-011577] [Repl] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'
I tried different approaches (staying in 0.8.0 and changing image to 8.0.41, fixing the replication-user before upgrading (as this seems to have changed in the 0.9.0), …) nothing worked EXCEPT one: taking backup of the 8.0.35 cluster, purging the whole cluster out of kubernetes, then installing a fresh setup going right to 0.9.0/8.0.41 and then restoring the backup into the cluster - it worked.
For smaller instances not a real problem - for bigger databases this is a game-stopper.
Steps to Reproduce:
- Install ps in version 0.8.0 with mysql-server image 8.0.35 on kubernetes
- Put in some data
- Change manifest to have the operator be 0.9.0 + wait for it to be rolled out
- Change manifest to have mysql image to be 8.0.41
- two pods stay in working state, the one pod getting upgraded won’t join the cluster anytime
Version:
operator 0.8.0 → 0.9.0
mysql 8.0.35 → 8.0.36+
Logs:
see above
Expected Result:
all cluster pods are updated to the desired mysql version without dataloss
Actual Result:
cluster down to minimum pods and not failure resilient + version of mysql-server is not upgraded
THANKS for any assist!