rpm -qa | grep -i pacemaker
rpm -qa | grep -i corosync
rpm -qa | grep -i percona
2 node Pacemaker cluster Master, Slave setup with STONITH enabled.
Apache resource - Web service
DRBD resource - Magento flat files for web service
MySQL resource - Database with replication
Failover happens as it should between the servers. But we recently had an
incident where database replication had broken and then the Master node
failed over to the Slave (loss of data).
I have found a setting in the cib for
<nvpair id=“p_mysql-instance_attributes-max_slave_lag” name=“max_slave_lag” value=“60”/>
<nvpair id=“p_mysql-instance_attributes-evict_outdated_slaves” name=“evict_outdated_slaves” value=“false”/>
If I set this to “true” then it works as intended and won’t fail over to
the Slave if the Slave replication falls 60 seconds behind the Master.
However if I then need to rebuild the Slave from a backup of the Master, I
will have to set the time so that I am able to rebuild the Slave before it
fails due to the “max_slave_lag” value.
Does someone know of a better way of performing this? We need to perform
patching on a monthly basis, so require failing over in the pacemaker
cluster from Master to Slave.