Description:
We run a read-only replica on k8s (OKD/Openshift) cluster with async replication (Multi-cluster and multi-region deployment - Percona Operator for MySQL or docs.percona.com/percona-operator-for-mysql/pxc/backups-move-from-external-db.html
)
All works well until i believe there is temporary network blip and replicating pod (the one with label percona.com/replicationPod=true
) can’t talk to its peers. The network partitioning changes wsrep_local_state_comment
from “Synced” to something else making wsrep unready. The async replication is not wsrep aware and tries to write into unready wsrep and gets errored out.
The only way out is to run start replica for channel XYZ
manually.
Is there a way we can prevent this from happenning? Or maybe operator enhancement like checking periodically if replication has not errored out.
Steps to Reproduce:
- Spin pxc cluster
- Setup async replication from external source
- Break comm between pod with label
percona.com/replicationPod=true
other pxc pods atleast on port 4567 effectively creating network partition (I am not sure how to do) - While that replication pod state is not
Synced
, write on external source so thats its copied to that pod via async repl. - See it erroring out and not recovering without manual intervention.
Version:
I have seen this in atleast v1.13.0 till latest (v1.16.1)
Logs:
Full logs - pxc operator log - Pastebin.com
prod-pxc-0 logs 02-08 17:44:25 {"log":"2025-02-08T17:44:23.885776Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.64276S), skipping check\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:44:32 {"log":"2025-02-08T17:44:25.742123Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') connection to peer 80860039-b723 with addr ssl://10.50.7.245:4567 timed out, no messages seen in PT3S, socket stats: rtt: 7468 rttvar: 10959 rto: 208000 lost: 0 last_data_recv: 182 cwnd: 10 last_queued_since: 1338267463 last_delivered_since: 4707667117 send_queue_length: 3 send_queue_bytes: 216 segment: 0 messages: 3 (gmcast.peer_timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:44:32 {"log":"2025-02-08T17:44:26.400995Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') connection to peer c657a5f2-be89 with addr ssl://10.50.8.36:4567 timed out, no messages seen in PT3S, socket stats: rtt: 9113 rttvar: 12785 rto: 210000 lost: 0 last_data_recv: 1397 cwnd: 10 last_queued_since: 28177666 last_delivered_since: 4918820083 send_queue_length: 4 send_queue_bytes: 428 segment: 0 messages: 4 (gmcast.peer_timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:44:34 {"log":"2025-02-08T17:44:28.038487Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.50.7.245:4567 ssl://10.50.8.36:4567 \n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:44:36 {"log":"2025-02-08T17:44:32.426952Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') reconnecting to 80860039-b723 (ssl://10.50.7.245:4567), attempt 0\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:44:41 {"log":"2025-02-08T17:44:35.788457Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') reconnecting to c657a5f2-be89 (ssl://10.50.8.36:4567), attempt 0\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:44:57 {"log":"2025-02-08T17:44:42.763698Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT17.2336S), skipping check\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:46:22 {"log":"2025-02-08T17:46:10.888927Z 0 [Note] [MY-000000] [Galera] Failed to establish connection: End of file\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:01 {"log":"2025-02-08T17:46:22.899995Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://10.50.7.245:4567 timed out, no messages seen in PT3S, socket stats: rtt: 59752 rttvar: 118571 rto: 535000 lost: 0 last_data_recv: 35256 cwnd: 10 last_queued_since: 6520995083366014 last_delivered_since: 6520995083366014 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:01 {"log":"2025-02-08T17:46:43.056845Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') connection to peer 00000000-0000 with addr ssl://10.50.8.36:4567 timed out, no messages seen in PT3S, socket stats: rtt: 140149 rttvar: 57147 rto: 415000 lost: 0 last_data_recv: 38143 cwnd: 10 last_queued_since: 6521006060605922 last_delivered_since: 6521006060605922 send_queue_length: 0 send_queue_bytes: 0 (gmcast.peer_timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:01.748598Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3M20.8192S), skipping check\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:01.860390Z 0 [Note] [MY-000000] [Galera] Failed to establish connection: Operation aborted.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:01.973680Z 0 [Note] [MY-000000] [Galera] Failed to establish connection: Operation aborted.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.049924Z 0 [Note] [MY-000000] [Galera] Failed to establish connection: End of file\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.114521Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') connection established to c657a5f2-be89 ssl://10.50.8.36:4567\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.124313Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') connection established to 80860039-b723 ssl://10.50.7.245:4567\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.354293Z 0 [Note] [MY-000000] [Galera] declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.442080Z 0 [Note] [MY-000000] [Galera] declaring node with index 0 inactive (evs.inactive_timeout) \n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.442518Z 0 [Note] [MY-000000] [Galera] declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.449341Z 0 [Note] [MY-000000] [Galera] declaring node with index 2 inactive (evs.inactive_timeout) \n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:02 {"log":"2025-02-08T17:48:02.450147Z 0 [Note] [MY-000000] [Galera] evs::proto(824d48c1-9250, OPERATIONAL, view_id(REG,80860039-b723,23)) detected inactive node: 80860039-b723\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:02.451813Z 0 [Note] [MY-000000] [Galera] evs::proto(824d48c1-9250, OPERATIONAL, view_id(REG,80860039-b723,23)) detected inactive node: c657a5f2-be89\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.457564Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node\nview (view_id(NON_PRIM,80860039-b723,23)\nmemb {\n\t824d48c1-9250,0\n\t}\njoined {\n\t}\nleft {\n\t}\npartitioned {\n\t80860039-b723,0\n\tc657a5f2-be89,0\n\t}\n)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.461645Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node\nview (view_id(NON_PRIM,824d48c1-9250,24)\nmemb {\n\t824d48c1-9250,0\n\t}\njoined {\n\t}\nleft {\n\t}\npartitioned {\n\t80860039-b723,0\n\tc657a5f2-be89,0\n\t}\n)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.461841Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.474156Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.474264Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.474282Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> OPEN (TO: 4616706)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.474656Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.474747Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.474766Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.475453Z 2 [Note] [MY-000000] [Galera] Maybe drain monitors from 4616706 upto current CC event 4616706 upto:4616706\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.475977Z 2 [Note] [MY-000000] [Galera] Drain monitors from 4616706 up to 4616706\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.478204Z 2 [Note] [MY-000000] [Galera] ================================================\nView:\n id: 58d61cd4-d899-11ef-815b-cabbdc4fe69e:4616706\n status: non-primary\n protocol_version: 4\n capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO\n final: no\n own_index: 0\n members(1):\n\t0: 824d48c1-e4dc-11ef-9250-d23a5995ec9e, prod-pxc-0\n=================================================\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.478546Z 2 [Note] [MY-000000] [Galera] Non-primary view\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.478660Z 2 [Note] [MY-000000] [WSREP] Server status change synced -> connected\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.479344Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.479683Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.479959Z 2 [Note] [MY-000000] [Galera] Maybe drain monitors from 4616706 upto current CC event 4616706 upto:4616706\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.479975Z 2 [Note] [MY-000000] [Galera] Drain monitors from 4616706 up to 4616706\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.479997Z 2 [Note] [MY-000000] [Galera] ================================================\nView:\n id: 58d61cd4-d899-11ef-815b-cabbdc4fe69e:4616706\n status: non-primary\n protocol_version: 4\n capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO\n final: no\n own_index: 0\n members(1):\n\t0: 824d48c1-e4dc-11ef-9250-d23a5995ec9e, prod-pxc-0\n=================================================\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.480008Z 2 [Note] [MY-000000] [Galera] Non-primary view\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.480018Z 2 [Note] [MY-000000] [WSREP] Server status change connected -> connected\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:03 {"log":"2025-02-08T17:48:03.480027Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.482269Z 0 [Note] [MY-000000] [Galera] declaring 80860039-b723 at ssl://10.50.7.245:4567 stable\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.482436Z 0 [Note] [MY-000000] [Galera] declaring c657a5f2-be89 at ssl://10.50.8.36:4567 stable\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.485068Z 0 [Note] [MY-000000] [Galera] Node 80860039-b723 state primary\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.489522Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node\nview (view_id(PRIM,80860039-b723,25)\nmemb {\n\t80860039-b723,0\n\t824d48c1-9250,0\n\tc657a5f2-be89,0\n\t}\njoined {\n\t}\nleft {\n\t}\npartitioned {\n\t}\n)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.490740Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.501133Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.501296Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.503605Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: d905e79e-e644-11ef-a6ec-d297982e38f5\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.506091Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: d905e79e-e644-11ef-a6ec-d297982e38f5 from 0 (prod-pxc-1)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.508607Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: d905e79e-e644-11ef-a6ec-d297982e38f5 from 1 (prod-pxc-0)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.511670Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: d905e79e-e644-11ef-a6ec-d297982e38f5 from 2 (prod-pxc-2)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.514757Z 0 [Note] [MY-000000] [Galera] Quorum results:\n\tversion = 6,\n\tcomponent = PRIMARY,\n\tconf_id = 22,\n\tmembers = 2/3 (primary/total),\n\tact_id = 4616707,\n\tlast_appl. = 4616692,\n\tprotocols = 4/11/4 (gcs/repl/appl),\n\tvote policy= 0,\n\tgroup UUID = 58d61cd4-d899-11ef-815b-cabbdc4fe69e\n","file":"/var/lib/mysql/mysqld-error.log"}
----------- errored async replication
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.515880Z 42 [ERROR] [MY-010584] [Repl] Replica SQL for channel 'prod_external': Worker 1 failed executing transaction '3d5f89c2-1c37-11ee-81d5-0050568ff6c2:155380565' at source log binlog.000044, end_log_pos 876499626; Error 'WSREP has not yet prepared node for application use' on query. Default database: 'db18'. Query: 'BEGIN', Error_code: MY-001047\n","file":"/var/lib/mysql/mysqld-error.log"}
----------- errored async replication end
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.516490Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [173, 173]\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.518959Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> PRIMARY (TO: 4616708)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.519137Z 2 [Note] [MY-000000] [Galera] ####### processing CC 4616708, local, ordered\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.519224Z 2 [Note] [MY-000000] [Galera] Maybe drain monitors from 4616706 upto current CC event 4616708 upto:4616706\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.519251Z 2 [Note] [MY-000000] [Galera] Drain monitors from 4616706 up to 4616706\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.519271Z 2 [Note] [MY-000000] [Galera] ####### My UUID: 824d48c1-e4dc-11ef-9250-d23a5995ec9e\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.519291Z 2 [Note] [MY-000000] [Galera] Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 11), state transfer needed: yes\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.522457Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.525350Z 2 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527007Z 2 [Note] [MY-000000] [Galera] State transfer required: \n\tGroup state: 58d61cd4-d899-11ef-815b-cabbdc4fe69e:4616708\n\tLocal state: 58d61cd4-d899-11ef-815b-cabbdc4fe69e:4616706\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527061Z 2 [Note] [MY-000000] [WSREP] Server status change connected -> joiner\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527076Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527181Z 2 [Note] [MY-000000] [WSREP] WSREP: You have configured 'xtrabackup-v2' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527228Z 2 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527262Z 2 [Note] [MY-000000] [Galera] ####### IST uuid:58d61cd4-d899-11ef-815b-cabbdc4fe69e f: 4616707, l: 4616708, STRv: 3\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527867Z 2 [Note] [MY-000000] [Galera] IST receiver addr using ssl://10.50.35.96:4568\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.527966Z 2 [Note] [MY-000000] [Galera] IST receiver using ssl\n","file":"/var/lib/mysql/mysqld-error.log"}
----------- errored async replication cont.
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.525874Z 41 [ERROR] [MY-010586] [Repl] Error running query, replica SQL thread aborted. Fix the problem, and restart the replica SQL thread with \"START REPLICA\". We stopped at log 'binlog.000044' position 876499468\n","file":"/var/lib/mysql/mysqld-error.log"}
----------- errored async replication cont. end
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.531486Z 2 [Note] [MY-000000] [Galera] Prepared IST receiver for 4616707-4616708, listening at: ssl://10.50.35.96:4568\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.536798Z 0 [Note] [MY-000000] [Galera] Member 1.0 (prod-pxc-0) requested state transfer from 'prod-pxc-1,'. Selected 0.0 (prod-pxc-1)(SYNCED) as donor.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.536855Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 4616708)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:04 {"log":"2025-02-08T17:48:04.537135Z 2 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 0\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.537215Z 2 [Note] [MY-000000] [Galera] Receiving IST: 2 writesets, seqnos 4616707-4616708\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.570377Z 0 [Note] [MY-000000] [Galera] 0.0 (prod-pxc-1): State transfer to 1.0 (prod-pxc-0) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.571148Z 0 [Note] [MY-000000] [Galera] ####### IST applying starts with 4616707\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.573050Z 0 [Note] [MY-000000] [Galera] ####### IST current seqno initialized to 4616693\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.575002Z 0 [Note] [MY-000000] [Galera] Receiving IST... 0.0% (0/16 events) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.575488Z 0 [Note] [MY-000000] [Galera] IST preload starting at 4616693\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.577034Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.577349Z 0 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:4616692, protocol version: 6\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.577407Z 0 [Note] [MY-000000] [Galera] Member 0.0 (prod-pxc-1) synced with group.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.578701Z 0 [Note] [MY-000000] [Galera] ####### Passing IST CC 4616707, must_apply: 1, preload: true\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.578783Z 0 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.578801Z 0 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 4616706 -> 4616707\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.578969Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.579328Z 0 [Note] [MY-000000] [Galera] Recording CC from ist: 4616707\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.579370Z 0 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from ist: 4616693\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.579384Z 0 [Note] [MY-000000] [Galera] Min available from gcache for CC from ist: 4512326\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.579558Z 0 [Note] [MY-000000] [Galera] ####### Passing IST CC 4616708, must_apply: 1, preload: true\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.582579Z 11 [Note] [MY-000000] [Galera] ================================================\nView:\n id: 58d61cd4-d899-11ef-815b-cabbdc4fe69e:4616707\n status: primary\n protocol_version: 4\n capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO\n final: no\n own_index: -1\n members(2):\n\t0: 80860039-d899-11ef-b723-87e32c8c2f58, prod-pxc-1\n\t1: c657a5f2-d89a-11ef-be89-166795724667, prod-pxc-2\n=================================================\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.587345Z 11 [Note] [MY-000000] [WSREP] Server status change joiner -> joined\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.587829Z 11 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.589214Z 11 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.617184Z 0 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.617343Z 0 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 4616707 -> 4616708\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.617517Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.617734Z 0 [Note] [MY-000000] [Galera] Recording CC from ist: 4616708\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.617947Z 0 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from ist: 4616693\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.617982Z 0 [Note] [MY-000000] [Galera] Min available from gcache for CC from ist: 4512326\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.618342Z 2 [Note] [MY-000000] [Galera] ================================================\nView:\n id: 58d61cd4-d899-11ef-815b-cabbdc4fe69e:4616708\n status: primary\n protocol_version: 4\n capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO\n final: no\n own_index: 1\n members(3):\n\t0: 80860039-d899-11ef-b723-87e32c8c2f58, prod-pxc-1\n\t1: 824d48c1-e4dc-11ef-9250-d23a5995ec9e, prod-pxc-0\n\t2: c657a5f2-d89a-11ef-be89-166795724667, prod-pxc-2\n=================================================\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.618685Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.619444Z 0 [Note] [MY-000000] [Galera] Receiving IST... 100.0% (16/16 events) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.630696Z 2 [Note] [MY-000000] [Galera] Draining apply monitors after IST up to 4616708\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.633461Z 2 [Note] [MY-000000] [Galera] IST received: 58d61cd4-d899-11ef-815b-cabbdc4fe69e:4616708\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.636396Z 2 [Note] [MY-000000] [Galera] Recording CC from sst: 4616708\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.636474Z 2 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from sst: 4616693\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.636502Z 2 [Note] [MY-000000] [Galera] Min available from gcache for CC from sst: 4512326\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.640244Z 0 [Note] [MY-000000] [Galera] 1.0 (prod-pxc-0): State transfer from 0.0 (prod-pxc-1) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.640356Z 0 [Note] [MY-000000] [Galera] SST leaving flow control\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.640382Z 0 [Note] [MY-000000] [Galera] Shifting JOINER -> JOINED (TO: 4616708)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.640510Z 0 [Note] [MY-000000] [Galera] Processing event queue:... -nan% (0/0 events) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.645372Z 0 [Note] [MY-000000] [Galera] Member 1.0 (prod-pxc-0) synced with group.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.645462Z 0 [Note] [MY-000000] [Galera] Processing event queue:... 100.0% (1/1 events) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.645496Z 0 [Note] [MY-000000] [Galera] Shifting JOINED -> SYNCED (TO: 4616708)\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.645596Z 2 [Note] [MY-000000] [Galera] Server prod-pxc-0 synced with group\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.647388Z 2 [Note] [MY-000000] [WSREP] Server status change joined -> synced\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.647430Z 2 [Note] [MY-000000] [WSREP] Synchronized with group, ready for connections\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:04.647448Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.\n","file":"/var/lib/mysql/mysqld-error.log"}
prod-pxc-0 logs 02-08 17:48:05 {"log":"2025-02-08T17:48:05.460095Z 0 [Note] [MY-000000] [Galera] (824d48c1-9250, 'ssl://0.0.0.0:4567') turning message relay requesting off\n","file":"/var/lib/mysql/mysqld-error.log"}
Expected Result:
Maybe operator periodically runs replication check and starts it if sql error code is MY-001047 or whatever appropriate.
Actual Result:
Replication is errored out with
Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '3d5f89c2-1c37-11ee-81d5-0050568ff6c2:155380565' at source log binlog.000044, end_log_pos 876499626. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Additional Information:
Expected result may look like feature request but I am open to suggestion on other ways to fix.