High availability is not working in Percona xtra db cluster

Hi All,
Recently we installed Percona xtra DB cluster 5.7 on CentOS 7 (3 nodes) and bootstraped the first node. The cluster was working fine for 10 days but after that the 1st and 2nd nodes were crashed due to lack of space in “/” partition(binary logs occupied the space) and the 3rd node was running fine. But still it could not serve the reads/writes for the clients.
After cleaning/creating some space on the server, i attempted to start the MySQL service on the failed node but it could not start. Tried with another failed node also but still unlucky. Then i realized that the 3rd node which was up and running was not bootstrapped and i have done the same. After this i tried to start the failed nodes and this time it was successful and all the 3 nodes are in sync.
This arises a lot of questions to me…

  1. If bootstrapping was the reason of failure for the first attempt, why it automatically did not shifted to the 3rd node which was live at the point of time?
  2. When the 2 nodes were down due to whatsoever reasons, why read/writes did not permit on the 3rd live node?

I hope there may be some need of configuration changes … Any help here would be highly appreciated…

Thanks

Tulasi

Hello, for our team to help you out here, you’ll need to send us some more information please
[LIST]
[]The specific version(s) of Centos
[
]The specific version of Percona XtraDB Cluster, and if it’s not the latest they will probably advise in any case to update the software.
[]The config files (each node please)
[
]Information about the configuration topology from the point of view of network etc
[*]Any error files that were written during the time when you were having problems
[/LIST] Certainly they would need configuration files if you’d like feedback on configuration changes. :slight_smile:

If you can get this information to us then I’ll ask one of the Percona XtraDB Cluster team to take a look. Thanks!

Hi Iorraine,

Thanks for the reply…

Here are the details of our cluster…

  1. The version of CentOS:
    CentOS Linux release 7.5.1804 (Core)

  2. DB versions:
    Variable_name | Value |
    ±------------------------±------------------------------------------------------------------------------------------------+
    | innodb_version | 5.7.22-22 |
    | protocol_version | 10 |
    | slave_type_conversions | |
    | tls_version | TLSv1,TLSv1.1,TLSv1.2 |
    | version | 5.7.22-22-57-log |
    | version_comment | Percona XtraDB Cluster (GPL), Release rel22, Revision da86071, WSREP version 29.26, wsrep_29.26 |
    | version_compile_machine | x86_64 |
    | version_compile_os | Linux |
    | version_suffix | -57-log

  3. Attached the configuration files to this thread.

  4. All the 3 servers are in LAN network connected through private ips.

Thanks

Tulasi

mysql cnf files for forum.txt (4.12 KB)

Hi Tulasi,
Please let me answer your questions here.

This is actually works by design of the PXC (Percona XtraDB Cluster). When u have 3 nodes in the cluster and 2 nodes are down due to some issue, then 3rd node will not get enough quorum vote to promote itself as Primary and so it will go into Non-Primary status. And you need to make it Primary forcefully by bootstrapping it.

But this is not the case when the other 2 server are shutdown gracefully. In that case, the third node becomes Primary automatically as now the cluster knows that the other two servers are left gracefully.

The node should be in Primary status to receive read/write. In your case, the third node went into Non-Primary status. You can check the status of the node as below:

mysql> show global status like 'wsrep_cluster_status';
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+
1 row in set (0.00 sec)

Else you will receive error like below when querying the database:

mysql> show global status like 'wsrep_cluster_status';
+----------------------+-------------+
| Variable_name | Value |
+----------------------+-------------+
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
1 row in set (0.00 sec)
mysql> use vinodh
Database changed, 1 warning
mysql> show warnings;
+-------+------+-----------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------+
| Error | 1047 | WSREP has not yet prepared node for application use |
+-------+------+-----------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

Please check this blog post - https://www.percona.com/blog/2015/06/12/percona-xtradb-cluster-quorum-availability-cluster/ which explains this one in detail.

Best Regards,
Vinodh Krishnaswamy,

Interested in attending Percona Live Europe? Find out more here!
Sponsorship opportunities can be found here.

Hi Vinodh,

Thanks for your valuable reply.

Regards

Tulasi

Hi Vinod,

During last week, we had a problem with Percona . It was observed that when we bring non-primary node up to repair, it cleans/wipes all data under that node and then sync up’s with primary bringing in all that data over ( typically by rsync). And finally opens MySQL instance for use.

This type of behavior works well for small instances but it will be a problem for bigger data.

I wanted to know if is this default behaviour or is there any work around for this? In my opinion, Ideally it should read binary logs of MySQL on node that is getting repaired and should sync from there on with primary.

Regards

Tulasi