PXC cluster, 3rd pod stuck in CrashLoopBackOff

I’m having the identical issue on two independent K8s clusters (let’s call them A and B) using the PXC operator. On both the third pod (pod/database-pxc-db-pxc-2) is stuck in a CrashLoopBackOff with a failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out) error. At first I thought there’s a networking issue resolving the hostnames or issue connecting to the ip/port, but I can exec into the container connect to the other nodes just fine - in both clusters.

I deployed PXC using the helm chart.

helm upgrade --install --namespace db --create-namespace pxc-operator percona/pxc-operator
helm upgrade --install --namespace db db percona/pxc-db --values values.yaml

values.yaml is pretty vanilla:

pxc:
  resources:
    requests:
      cpu: 2000m
      memory: 4Gi
  persistence:
    size: 50Gi

Cluster status:

$ kubectl -n db get pxc
NAME              ENDPOINT                     STATUS         PXC   PROXYSQL   HAPROXY   AGE
database-pxc-db   database-pxc-db-haproxy.db   initializing                    3         7d20h

$ kubectl -n db get all
NAME                                READY   STATUS             RESTARTS           AGE
pod/database-pxc-db-haproxy-0       2/2     Running            0                  7d20h
pod/database-pxc-db-haproxy-1       2/2     Running            0                  7d20h
pod/database-pxc-db-haproxy-2       2/2     Running            0                  7d20h
pod/database-pxc-db-pxc-0           3/3     Running            0                  7d20h
pod/database-pxc-db-pxc-1           3/3     Running            0                  7d20h
pod/database-pxc-db-pxc-2           2/3     CrashLoopBackOff   1469 (2m53s ago)   5d23h
pod/pxc-operator-77bfdc95bb-7l47b   1/1     Running            1 (5d2h ago)       7d20h

NAME                                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                 AGE
service/database-pxc-db-haproxy            ClusterIP   10.43.113.242   <none>        3306/TCP,3309/TCP,33062/TCP,33060/TCP   7d20h
service/database-pxc-db-haproxy-replicas   ClusterIP   10.43.42.84     <none>        3306/TCP                                7d20h
service/database-pxc-db-pxc                ClusterIP   None            <none>        3306/TCP,33062/TCP,33060/TCP            7d20h
service/database-pxc-db-pxc-unready        ClusterIP   None            <none>        3306/TCP,33062/TCP,33060/TCP            7d20h
service/percona-xtradb-cluster-operator    ClusterIP   10.43.149.113   <none>        443/TCP                                 7d20h

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/pxc-operator   1/1     1            1           7d20h

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/pxc-operator-77bfdc95bb   1         1         1       7d20h

NAME                                       READY   AGE
statefulset.apps/database-pxc-db-haproxy   3/3     7d20h
statefulset.apps/database-pxc-db-pxc       2/3     7d20h

Thing is though, cluster A has been running for a couple of days. And initially the k get pxc state must have been “ready” - otherwise the deployment of the cluster wouldn’t have finished, since part of the deployment code is to wait for this state and then create some databases and users.

Today I bootstrapped a second K8s cluster B, and it immediately got stuck in this state where the first two pods are in Running state and the third pod is in CrashLoopBackOff state. When I checked on cluster A I saw that it had the same issue.

The logs of both cluster’s third pods are near identical:

{"log":"2023-09-20T12:51:23.174679Z 0 [Warning] [MY-011068] [Server] The syntax '--skip-host-cache' is deprecated and will be removed in a future release. Please use SET GLOBAL host_cache_size=0 instead.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.174702Z 0 [Warning] [MY-011068] [Server] The syntax 'wsrep_slave_threads' is deprecated and will be removed in a future release. Please use wsrep_applier_threads instead.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.175645Z 0 [Warning] [MY-010097] [Server] Insecure configuration for --secure-log-path: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.176142Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.32-24.2) starting as process 1\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192100Z 0 [Warning] [MY-010068] [Server] CA certificate /etc/mysql/ssl-internal/ca.crt is self signed.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192151Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192161Z 0 [Note] [MY-000000] [WSREP] New joining cluster node configured to use specified SSL artifacts\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192190Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib64/galera4/libgalera_smm.so initial position: 16c5c2b8-5187-11ee-a24e-cfd1d23b37a2:44\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192202Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib64/galera4/libgalera_smm.so'\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192788Z 0 [Note] [MY-000000] [Galera] wsrep_load(): Galera 4.14(779b689) by Codership Oy <info@codership.com> (modified by Percona <https://percona.com/>) loaded successfully.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192822Z 0 [Note] [MY-000000] [Galera] CRC-32C: using 64-bit x86 acceleration.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.192998Z 0 [Warning] [MY-000000] [Galera] SSL compression is not effective. The option socket.ssl_compression is deprecated and will be removed in future releases.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193021Z 0 [Warning] [MY-000000] [Galera] Parameter 'socket.ssl_compression' is deprecated and will be removed in future versions\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193388Z 0 [Note] [MY-000000] [Galera] Found saved state: 16c5c2b8-5187-11ee-a24e-cfd1d23b37a2:44, safe_to_bootstrap: 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193479Z 0 [Note] [MY-000000] [Galera] GCache DEBUG: opened preamble:\nVersion: 2\nUUID: 16c5c2b8-5187-11ee-a24e-cfd1d23b37a2\nSeqno: 24 - 44\nOffset: 1992\nSynced: 1\nEncVersion: 1\nEncrypted: 0\nMasterKeyConst UUID: 6a6686f8-5187-11ee-9d6c-5a5addbae25a\nMasterKey UUID: 00000000-0000-0000-0000-000000000000\nMasterKey ID: 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193492Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: version: 2, UUID: 16c5c2b8-5187-11ee-a24e-cfd1d23b37a2, offset: 1992\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193561Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193605Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193614Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: found gapless sequence 24-44\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193633Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer unused buffers scan...  0.0% (   0/6088 bytes) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193660Z 0 [Note] [MY-000000] [Galera] GCache::RingBuffer unused buffers scan...100.0% (6088/6088 bytes) complete.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193667Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: found 0/21 locked buffers\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.193672Z 0 [Note] [MY-000000] [Galera] Recovering GCache ring buffer: free space: 134211640/134217728\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.197189Z 0 [Note] [MY-000000] [Galera] Passing config to GCS: allocator.disk_pages_encryption = no; allocator.encryption_cache_page_size = 32K; allocator.encryption_cache_size = 16777216; base_dir = /var/lib/mysql/; base_host = 10.244.2.15; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.encryption = no; gcache.encryption_cache_page_size = 32K; gcache.encryption_cache_size = 16777216; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 10; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; socket.ssl = YES; socket.ssl_ca = /etc/mysql/ssl-internal/ca.crt; socket.ssl_cert = /etc/mysql/ssl-internal/tls.crt; socket.ssl_cipher = ; socket.ssl_compression = YES; socket.ssl_key = /etc/mysql/ssl-internal/tls.key; socket.ssl_reload = 1; \n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205207Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205326Z 0 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 16c5c2b8-5187-11ee-a24e-cfd1d23b37a2:44, protocol version: -1\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205427Z 0 [Note] [MY-000000] [WSREP] Starting replication\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205443Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205453Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to 16c5c2b8-5187-11ee-a24e-cfd1d23b37a2:44\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205511Z 0 [Note] [MY-000000] [Galera] protonet asio version 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205966Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.205997Z 0 [Note] [MY-000000] [Galera] backend: asio\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.206071Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0 \n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.206182Z 0 [Note] [MY-000000] [Galera] Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.206194Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.206308Z 0 [Note] [MY-000000] [Galera] GMCast version 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.208412Z 0 [Note] [MY-000000] [Galera] (67310e9b-bce3, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.208442Z 0 [Note] [MY-000000] [Galera] (67310e9b-bce3, 'ssl://0.0.0.0:4567') multicast: , ttl: 1\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.208717Z 0 [Note] [MY-000000] [Galera] EVS version 1\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:23.208814Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'database-pxc-db-pxc', peer 'database-pxc-db-pxc-0.database-pxc-db-pxc:,database-pxc-db-pxc-1.database-pxc-db-pxc:'\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:26.210696Z 0 [Note] [MY-000000] [Galera] announce period timed out (pc.announce_timeout)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:26.210789Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:26.210804Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:26.210822Z 0 [Warning] [MY-000000] [Galera] no nodes coming from prim view, prim not possible\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:26.210845Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node\nview (view_id(NON_PRIM,67310e9b-bce3,1)\nmemb {\n\t67310e9b-bce3,0\n\t}\njoined {\n\t}\nleft {\n\t}\npartitioned {\n\t}\n)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:26.711233Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.5025S), skipping check\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:56.221805Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:56.221878Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node\nview ((empty))\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:56.221962Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)\n\t at gcomm/src/pc.cpp:connect():161\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:56.221981Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.222226Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.222316Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.222496Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs.cpp:gcs_open():1811: Failed to open channel 'database-pxc-db-pxc' at 'gcomm://database-pxc-db-pxc-0.database-pxc-db-pxc,database-pxc-db-pxc-1.database-pxc-db-pxc': -110 (Connection timed out)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.222519Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.222531Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://database-pxc-db-pxc-0.database-pxc-db-pxc,database-pxc-db-pxc-1.database-pxc-db-pxc) failed to establish connection with cluster (reason: 7)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.222542Z 0 [ERROR] [MY-010119] [Server] Aborting\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.222876Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.32-24.2)  Percona XtraDB Cluster (GPL), Release rel24, Revision 2119e75, WSREP version 26.1.4.3.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.223536Z 0 [ERROR] [MY-010065] [Server] Failed to shutdown components infrastructure.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.223699Z 0 [Note] [MY-000000] [Galera] dtor state: CLOSED\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.223724Z 0 [Note] [MY-000000] [Galera] MemPool(TrxHandleSlave): hit ratio: 0, misses: 0, in use: 0, in pool: 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.225371Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.226881Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228272Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228295Z 0 [Note] [MY-000000] [Galera] cert index usage at exit 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228303Z 0 [Note] [MY-000000] [Galera] cert trx map usage at exit 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228309Z 0 [Note] [MY-000000] [Galera] deps set usage at exit 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228320Z 0 [Note] [MY-000000] [Galera] avg deps dist 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228328Z 0 [Note] [MY-000000] [Galera] avg cert interval 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228334Z 0 [Note] [MY-000000] [Galera] cert index size 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228445Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228539Z 0 [Note] [MY-000000] [Galera] wsdb trx map usage 0 conn query map usage 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228557Z 0 [Note] [MY-000000] [Galera] MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.228691Z 0 [Note] [MY-000000] [Galera] Shifting CLOSED -> DESTROYED (TO: 0)\n","file":"/var/lib/mysql/mysqld-error.log"}
{"log":"2023-09-20T12:51:57.231614Z 0 [Note] [MY-000000] [Galera] Flushing memory map to disk...\n","file":"/var/lib/mysql/mysqld-error.log"}

Now, when I wait for the right moment, where the cluster just schedules the pod, I can connect to it and manually resolve the host it tries to connect to and also do the connection.

$ kubectl -n db exec -it --container pxc pod/database-pxc-db-pxc-2 -- sh
sh-4.4$ getent hosts database-pxc-db-pxc-0.database-pxc-db-pxc
10.244.10.11    database-pxc-db-pxc-0.database-pxc-db-pxc.db.svc.cluster.local
sh-4.4$ cat < /dev/tcp/database-pxc-db-pxc-0.database-pxc-db-pxc/3306
O
8.0.32-24.29H\2+iCIv�����9LZhnJL>caching_sha2_password

^C
sh-4.4$

As you can see DNS and network seem to work. So I’m at a loss as to the “Connection timed out” message. Are there any other ports I should test?

I’ve read a lot about how to recover a failed PXC cluster manually, but I figured since I’m just using the operator in a pretty vanilla config and the networking is fine, I should probably not manually mess with anything.

Sorry this has been such a long post. I’d appreciate any ideas and input!

I would suggest that if you are familiar with this, go ahead and attempt. Perhaps in doing so will shine some light on where the real issue might be.