I have installed two nodes cluster using Percona and Clusterlabs manuals.
I configured 2 resources: ClusterIP and Master/Slave Set for MariaDB 10.
Also i enabled replication using GTID.
Resources has this configuration.
[root@centos-web02 percona]# pcs constraint colocation add master ms_MySQL with ClusterIP
[root@centos-web02 percona]# pcs resource op add p_mysql monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1"
[root@centos-web02 percona]# pcs resource op add p_mysql monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1"
[root@centos-web02 ~]# pcs status resources ms_MySQL
Master: ms_MySQL
Resource: p_mysql (class=ocf provider=percona type=mysql)
Attributes: config=/etc/my.cnf pid=/var/run/mariadb/mariadb.pid socket=/var/lib/mysql/mysql.sock replication_user=replication replication_passwd=**** max_slave_lag=60 evict_outdated_slaves=false binary=/usr/sbin/mysqld test_user=test_user test_passwd=**** master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false target-role=Master is-managed=true
Operations: start interval=0s timeout=120 (p_mysql-start-interval-0s)
stop interval=0s timeout=120 (p_mysql-stop-interval-0s)
monitor interval=20 timeout=30 (p_mysql-monitor-interval-20)
monitor interval=10 role=Master timeout=30 (p_mysql-monitor-interval-10)
monitor interval=30 role=Slave timeout=30 (p_mysql-monitor-interval-30)
promote interval=0s timeout=120 (p_mysql-promote-interval-0s)
demote interval=0s timeout=120 (p_mysql-demote-interval-0s)
monitor interval=5s role=Master OCF_CHECK_LEVEL=1 (p_mysql-monitor-interval-5s)
monitor interval=2s role=Slave OCF_CHECK_LEVEL=1 (p_mysql-monitor-interval-2s)
[root@centos-web02 ~]# pcs status resources ms_MySQL [p_mysql]
I have checked switchover by command:
[root@centos-web02 ~]# pcs resource move ClusterIP centos-web03
The resources have moved, replication was worked.
And after back switchover all worked fine to.
[root@centos-web02 ~]# pcs resource move ClusterIP centos-web02
But when i tried to stop cluster on centos-web02 which was a Master and then start it again, something went wrong.
[root@centos-web02 ~]# pcs cluster stop centos-web02
centos-web02: Stopping Cluster (pacemaker)...
centos-web02: Stopping Cluster (corosync)...
[root@centos-web02 ~]# pcs cluster start centos-web02
centos-web02: Starting Cluster...
[root@centos-web02 ~]# pcs resource
ClusterIP (ocf::heartbeat:IPaddr2): Started centos-web02
Master/Slave Set: ms_MySQL [p_mysql]
Masters: [ centos-web02 ]
Slaves: [ centos-web03 ]
I have “# Database consistency check failed” in mysqldbcompare. And error in the replication.
#On centos-web02
MariaDB [mysql]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: binlog.000015
Position: 56096
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
#On centos-web03
MariaDB [mysql]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: centos-web02.simfy.arkadium.com
Master_User: replication
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000015
Read_Master_Log_Pos: 56096
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 534
Relay_Master_Log_File: binlog.000015
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Could not execute Write_rows_v1 event on table arkadium.wp_options; Duplicate entry '28980' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000015, end_log_pos 557
Skip_Counter: 0
Exec_Master_Log_Pos: 356
Relay_Log_Space: 56572
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Could not execute Write_rows_v1 event on table arkadium.wp_options; Duplicate entry '28980' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000015, end_log_pos 557
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
1 row in set (0.00 sec)
Do i understand correctly that cluster should not crashes in this scenario? Have i done something wrong?
My goal is automatically failover.