Unusual (?) amount of deadlocks after moving to clustering

To preface this, I am aware of the optimistic locking scheme used between nodes in the cluster, and that what is being reported back as a deadlock, is not truely a deadlock, but a notification of an intersect on row updates between nodes. (I think I have stated that correctly). But having said that I am seeing deadlocks reported as errors in our app more often than I would expect, and I’m seeing it reported on select statements. Which doesn’t make sense to me. The select and active record error message are below. The tables in the query are very rarely updated in the app, so I doubt we’d see multiple updaters on either table. The only thing I can think of, is that the statement is part of a transaction, and other sql in the transaction could cause a multi update error. But if that were the case, why flag the select as deadlocked ?

	[IMG]https://mail.google.com/mail/u/0/images/cleardot.gif[/IMG]

Deadlocks on selects are perfectly possible in Galera – this is the result of a Brute Force Abort. That is, any transaction being applied via replication on the node your SELECT is on has the ability to rollback ANY and ALL transactions currently open on that node if they have any locking conflicts with the transaction being applied. In Galera, the first commit always wins, and any other transactions are blown away.

The example I give in this blog post: [url]http://www.mysqlperformanceblog.com/2012/08/17/percona-xtradb-cluster-multi-node-writing-and-unexpected-deadlocks/[/url] where I get the deadlock on COMMIT would actually also happen if I ran a SELECT there instead.

I describe brute force aborts in more detail here: [url]http://www.mysqlperformanceblog.com/2012/11/20/understanding-multi-node-writing-conflict-metrics-in-percona-xtradb-cluster-and-galera/[/url]

I understand where the Brute Force Abort would cause a select to fail if that select were part of a larger transaction that was rolled back. I thought that the selects that were failing were smaller transactions, but perhaps that isn’t the case. While I was investigating, I saw that wsrep_slave_threads was still set to a default of 1, and I’ve adjusted that to 8, per a recommendation to have at least 2 per core. I also adjusted wsrep_retry_autocommit up from its default to 5. I’ll run with the new config a couple days to see if the amount of deadlocks goes down. It could be that the number of slaves was causing a wait on the slave writes. Is there a way to see if that was the case ?

Realize that increasing wsrep_retry_autocommit like that will potentially make your autocommit writes very slow from the client’s perspective.

The number of nodes in the cluster shouldn’t affect time to replicate that much, it’s more about the largest RTT between any two nodes in the cluster that will affect replicate/commit time.

However, Local Cert failure (deadlock on COMMIT) will happen on a node where we’re trying to update a row that is modified in a trx still in the apply queue. You can monitor the wsrep_local_recv_queue , but generally this should be 0 unless you have increased the gcs.fc_limit a lot. It’s flow control’s job to keep that queue small.

I realize that bumping up the wsrep_retry_autocommit might make the autocommit writes slower, that’s why I didn’t bump it up by an obscene amount, and set it at 5.

I think I may have not been clear on a point, I don’t think its the number of nodes in the cluster that was an issue, but that I had wsrep_slave_threads set to low. I think I have may have had writes queueing up behind the a long running update process on the single slave that I had configured. wsrep_local_recv_queue is sitting at 0 at the moment, and I haven’t seen it go above that, although the wsrep_local_recv_queue_avg on a couple nodes is above 0, so there must have been a queue at one point.

wsrep_local_cert_failures and wsrep_local_bf_aborts are moving up at about a combined pace of 20-30 across the nodes. Given an average calls per minute of ~7K, I don’t think that’s to far out of line, although I’d like to see it move closer to 0.

I’ll keep looking into it. I think we might be able to isolate the are of the app that the majority of the deadlocks are occuring and get it to do a better job of retrying the call.