Description:
Hello, I am using Percona MySQL Operator 0.9.0 with 2 MySQL instances with async replication. I noted that version upgrade is failing sometimes. I am testing upgrade from 8.0.37-29 to 8.0.39-30. Sometimes it works as expected, but sometimes orchestrator (correct me if it is not it responsible for it) failing to promote replica to new primary as as a result I have 2 read-only instances. In the orchestrator UI I see them as two Co-master with constantly growing replication lag:
The reason seems replication lag that occurs during upgrade procedure (was 0 right before upgrade). But the successful recovery shows even bigger promoted replica lag seconds and still - promoted replica sql thread up-to-date.
Logs:
Here is the logs of operator and logs of recovery steps for successful upgrade and one that failed. Versions of upgrade are always the same.
operator.txt (6.2 KB)
failed-recovery.txt (1.9 KB)
successful-recovery.txt (4.7 KB)