Percona XtraDB Cluster 5.7 nodes have different bin log information

I lost a table yesterday due to some user error, but fortunately we have nightly backups using xtrabackup to S3, so I was able to restore as of midnight, yesterday.

Also fortunately I have bin logs enabled! And so lies my problems.

I have 3 nodes, one of which I turned off just in case I messed up my restore I didn’t permanently delete everything. I just turned it back on today, and it looks like the SST wiped out the bin log files in the data directory? Ouch, so now I have two servers with the binlog files.

I re-ran the transactions of one of my nodes bin logs on a separate database with my restored data on it with a command like this

mysqlbinlog --start-datetime="2022-08-24 09:20:00" --stop-datetime="2022-08-24 09:25:00" /home/brian/bin-logs/db1/ip-172-31-24-62-bin.[0-9]*

And this seems to work correctly, except it seems to have missed a bunch of things? We have email records for example of our order confirmations, so we have invoice numbers to things that no longer exist in our db. I couldn’t find them at all. So I took the bin logs from my second node and ran those ones on a fresh restore, to see if maybe it would get everything instead. But it seems to have restored a subset of the missing data? Almost like the bin logs on all 3 nodes only have one third the data each? Not sure at all why that’d be the case, I sure definitely hope not. I have one node as the master/write node where all write transactions come through (using HA proxy), so I was hoping it would have all the transactions.

Is this correct? Does each node only have it’s “share” of transactions in the bin log? Have I lost one third of my things because of the one node with no more bin logs?

1 Like

This is solved.

What was happening is that I was only running the bin log files against one restored table and I was ignoring the errors about the other tables in the output.

Turns out those bin log statements work in a transaction way including multiple changes at once, and if it failed, the entire transaction failed, losing some of my inserts on my target table.

The reason it was behaving different on different server’s bin log files is because the queries in the bin logs happened in slightly different order and with slightly different transaction groupings, so on one server an insert might succeed because of how it happened to be grouped, but on another set of bin log files it would fail.

Woo! I have all my data :slight_smile:

1 Like