I have PXC setup in Amazon VPC, all nodes are in same region but one node from three is in different availability zone. One some point in time one node fails without any meaningful output in logs:
2014-04-03 01:30:38 8514 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.68236S), skipping check 2014-04-03 01:30:40 8514 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.53889S), skipping check 140403 03:30:39 mysqld_safe Number of processes running now: 0 140403 03:30:39 mysqld_safe WSREP: not restarting wsrep node automatically 140403 03:30:39 mysqld_safe mysqld from pid file /var/lib/mysql/ip-10-1-7-180.pid ended
This node is the one that in another availability zone.
This is my.cnf file:
[mysqld] datadir=/var/lib/mysql user=mysql wsrep_provider=/usr/lib64/libgalera_smm.so wsrep_cluster_address=gcomm://10.1.7.180,10.1.8.159,10.1.8.16 binlog_format=ROW default_storage_engine=InnoDB innodb_locks_unsafe_for_binlog=1 innodb_buffer_pool_size = 5632M innodb_log_buffer_size = 4M max_connect_errors = 10000 key_buffer_size = 2048M max_allowed_packet = 50M table_open_cache = 1024 sort_buffer_size = 2M read_buffer_size = 2M read_rnd_buffer_size = 80M myisam_sort_buffer_size = 64M thread_cache_size = 32 query_cache_size = 32M innodb_thread_concurrency = 8 innodb_flush_method=O_DIRECT innodb_log_file_size=1G innodb_autoinc_lock_mode=2 wsrep_node_address=10.1.7.180 wsrep_sst_method=xtrabackup wsrep_cluster_name=my_centos_cluster wsrep_sst_auth="sstuser:s3cret" max_connections = 4000 [mysql] prompt=\\u@\\h [\\d]>\\_
The question is how can I investigate the root cause of the failure please? Also another question, what would be if update query will arrive on the node that is in “Joining: receiving State Transfer” state
Thank you in advance.