I have a theoretical question about the behavior of PXC that a client of mine asked me and I haven’t been able to get a good answer from searching the documentation or by Googling.
Suppose we have a three-node XtraDB Cluster, and our application sends an insert query to node 1, say something like:
insert into tbl_items(sku) values(1234);
My understanding of the PXC architecture tells me that the local node notifies the other two nodes about this event, so they can certify and then either commit or send back a failure message, (for the purpose of this example, lets assume that there are no pending transactions on nodes 2 and 3, so the certification succeeds and those nodes are clear to commit the transaction), and then it certifies the transaction locally before committing it (or rolling it back) and replying back to the application.
So what happens if node 1 fails after sending the transaction out to the other two nodes, but before it commits it locally, and more importantly, before it replies back to the application that the transaction was successful?
It seems like the application would likely timeout waiting for a reply (or I suppose it might receive a RST or something on the TCP level – either way, it would have to assume that the transaction failed), but nodes 2 and 3 would have successfully committed the data. This seems like an inconsistency that might result in the application re-inserting the data (resulting in extra, bogus data being inserted into the table), under the assumption that the timed-out query was not successful.
Is my understanding correct?
Perhaps more importantly, is this a situation that the Percona developers considered, and do they care (eg. is this an application-level concern?)?