Hi,Some help please. Over the last couple of days, we’re been having issues with our cluster. When a node goes down for some reason and starts coming back over SST, we’ve been getting a status of “Waiting for backup lock” when a “CREATE” or “DROP” table query is executed. Both are executed on InnoDB tables.
What this does is locks till the node is completely back up, which can take up to an hour. This renders the whole cluster unavailable during this time.Any ideas what is causing this? I’ve been running this cluster for 4+ years and this is the first time seeing this.Thanks!
Server version: 5.7.26-29-57 Percona XtraDB Cluster (GPL), Release rel29, Revision 03540a3, WSREP version 31.37, wsrep_31.37xtrabackup version 2.4.15 based on MySQL server 5.7.19 Linux (x86_64) (revision id: 544842a)
It looks like it could be related to this: https://jira.percona.com/browse/PXC-2365Can someone confirm that this is the same issue? Is there a resolution to this? Or, at least, a workaround?
Hi @tucj7 Yes you are right. As per my understanding i was expecting this issue on older version, but as you have shared it is not the issue in your case. Looking at above jira it is appears its a bug. May be someone can recommend interim solutions, if any.
Thanks for the feedback - that makes sense. The trouble I have is that when this happens it locks up the DB for the entire SST process, which currently runs around 1.5 hours. We have a bunch of jobs (and client initiated jobs) that run regularly that create/drop temp tables and it’s impossible to know when a node will go down and cause SST. It turns out this was likely happening due to a memory issue on one node. Do you have any recommendations on how to anticipate an SST and then push a change to crons/etc. to prevent DDL during this process? Or is there a better way to get around this. My constant fear is that this happens at critical times or after hours and causes considerable problems.