XtraDB Cluster loads, but no PID File, MySQL wont load.

Hi there, I recently setup Percona XtraDB Cluster on CentOS6.6 following the documentation.(http://www.percona.com/doc/percona-xtradb-cluster/5.6/howtos/cenots_howto.html) however, when I reboot the servers (virtual machines running 1 CPU and 512 Ram) it takes a long time to boot, once booted MySQL fails to start/restart with the following error:

service mysql restart

Shutting down MySQL (Percona XtraDB Cluster) ERROR! MySQL (Percona XtraDB Cluster) PID file could not be found!
ERROR! MySQL (Percona XtraDB Cluster) is running but PID file could not be found
ERROR! Failed to restart server.
[root@www2 ~]# service mysql start
ERROR! MySQL (Percona XtraDB Cluster) is running but PID file could not be found
[root@www2 ~]# service mysql stop
Shutting down MySQL (Percona XtraDB Cluster) ERROR! MySQL (Percona XtraDB Cluster) PID file could not be found!

When I try and connect to the MySQL database I get told it could not connect via the sock

Which is obvious because MySQL isn;t actually running. But I can;t even start it witout Percona spitting out the No PID Error

Here is my config file (As per the instructions I followed at the link above)

datadir=/var/lib/mysql
user=mysql

Path to Galera library

wsrep_provider=/usr/lib64/libgalera_smm.so

Cluster connection URL contains the IPs of node#1, node#2 and node#3

wsrep_cluster_address=gcomm://192.168.5.101,192.168.5.102

In order for Galera to work correctly binlog format should be ROW

binlog_format=ROW

MyISAM storage engine has only experimental support

default_storage_engine=InnoDB

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode=2

Node #1 address

wsrep_node_address=192.168.5.101

SST method

wsrep_sst_method=xtrabackup-v2

Cluster name

wsrep_cluster_name=my_centos_cluster

Authentication for SST method

wsrep_sst_auth=“sstuser:s3cret”
.

I also added the following lines to the bottom of the file with no avail

My Additions

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
socket=/var/lib/mysql/mysql.sock

My error logs show this:

150221 14:52:11 mysqld_safe Skipping wsrep-recover for 2a6bf596-b940-11e4-8a4a-6f644114be5a:2 pair
150221 14:52:11 mysqld_safe Assigning 2a6bf596-b940-11e4-8a4a-6f644114be5a:2 to wsrep_start_position
2015-02-21 14:52:15 0 [Note] WSREP: wsrep_start_position var submitted: ‘2a6bf596-b940-11e4-8a4a-6f644114be5a:2’
2015-02-21 14:52:15 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-02-21 14:52:16 1544 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-02-21 14:52:16 1544 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
2015-02-21 14:52:16 1544 [Note] WSREP: wsrep_load(): Galera 3.8(rf6147dd) by Codership Oy <info@codership.com> loaded successfully.
2015-02-21 14:52:16 1544 [Note] WSREP: CRC-32C: using hardware acceleration.
2015-02-21 14:52:16 1544 [Note] WSREP: Found saved state: 2a6bf596-b940-11e4-8a4a-6f644114be5a:2
2015-02-21 14:52:16 1544 [Note] WSREP: Passing config to GCS: base_host = 192.168.5.101; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recove
2015-02-21 14:52:17 1544 [Note] WSREP: Service thread queue flushed.
2015-02-21 14:52:17 1544 [Note] WSREP: Assign initial position for certification: 2, protocol version: -1
2015-02-21 14:52:17 1544 [Note] WSREP: wsrep_sst_grab()
2015-02-21 14:52:17 1544 [Note] WSREP: Start replication
2015-02-21 14:52:17 1544 [Note] WSREP: Setting initial position to 2a6bf596-b940-11e4-8a4a-6f644114be5a:2
2015-02-21 14:52:17 1544 [Note] WSREP: protonet asio version 0
2015-02-21 14:52:17 1544 [Note] WSREP: Using CRC-32C for message checksums.
2015-02-21 14:52:17 1544 [Note] WSREP: backend: asio
2015-02-21 14:52:17 1544 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
2015-02-21 14:52:17 1544 [Note] WSREP: restore pc from disk failed
2015-02-21 14:52:17 1544 [Note] WSREP: GMCast version 0
2015-02-21 14:52:17 1544 [Note] WSREP: (23dfb830, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
2015-02-21 14:52:17 1544 [Note] WSREP: (23dfb830, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
2015-02-21 14:52:17 1544 [Note] WSREP: EVS version 0
2015-02-21 14:52:17 1544 [Note] WSREP: gcomm: connecting to group ‘my_centos_cluster’, peer ‘192.168.5.101:,192.168.5.102:’
2015-02-21 14:52:17 1544 [Warning] WSREP: (23dfb830, ‘tcp://0.0.0.0:4567’) address ‘tcp://192.168.5.101:4567’ points to own listening address, blacklisting
2015-02-21 14:52:17 1544 [Note] WSREP: (23dfb830, ‘tcp://0.0.0.0:4567’) address ‘tcp://192.168.5.101:4567’ pointing to uuid 23dfb830 is blacklisted, skipping
2015-02-21 14:52:17 1544 [Note] WSREP: (23dfb830, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers:
2015-02-21 14:52:18 1544 [Note] WSREP: declaring 8fbe1659 at tcp://192.168.5.102:4567 stable
2015-02-21 14:52:18 1544 [Warning] WSREP: no nodes coming from prim view, prim not possible
2015-02-21 14:52:18 1544 [Note] WSREP: view(view_id(NON_PRIM,23dfb830,8) memb {
23dfb830,0
8fbe1659,0
} joined {
} left {
} partitioned {
a4d14ed5,0
bc924c22,0
eb831cca,0
})
2015-02-21 14:52:18 1544 [Note] WSREP: gcomm: connected
2015-02-21 14:52:18 1544 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2015-02-21 14:52:18 1544 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2015-02-21 14:52:18 1544 [Note] WSREP: Opened channel ‘my_centos_cluster’
2015-02-21 14:52:18 1544 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
2015-02-21 14:52:18 1544 [Note] WSREP: Flow-control interval: [23, 23]
2015-02-21 14:52:18 1544 [Note] WSREP: Received NON-PRIMARY.
2015-02-21 14:52:18 1544 [Note] WSREP: Waiting for SST to complete.
2015-02-21 14:52:18 1544 [Note] WSREP: New cluster view: global state: 2a6bf596-b940-11e4-8a4a-6f644114be5a:2, view# -1: non-Primary, number of nodes: 2, my index: 0, protocol version -1
2015-02-21 14:52:18 1544 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-02-21 14:52:21 1544 [Note] WSREP: (23dfb830, ‘tcp://0.0.0.0:4567’) turning message relay requesting off

And it does not create a .pid file or a .sock file.

Any ideas?

Also SELinux was disabled before install and never re-enabled. So that doesn;t seem to be the issue.

Here arescreen capture of the prcess trying to start during the boot up process. Takes close to 15 minutes before it fails completely and continues till the login prompt appears.

Did you bootstraped the first node as below:

node1$ /etc/init.d/mysql bootstrap-pxc

Also provide innobackup.backup.log to check what causing the issue.

Hey Mirfan,

Yes I did bootstrap the first node. Everything was running great, till I shutdown the virtual machines to go home. when I tried to reboot the next day, they just crashed, after about 25 minutes. I have attached the log file, renamed, as you cannot upload .log files.

innobackup-backup.txt (4.09 KB)

When you start node(s), It needs to connect to primary component. You shutdown virtual machines means no primary component.
So, when starting nodes they don’t have anything to join hence error. If no primary component exists you need to bootstrap first.

Not sure what you mean, the first machine to get booted is the primary node, which was bootstrapped upon install… Do i need to bootstrap it again every time I boot it? The log was from the primary node.

Regards,
Koster

Just tried this on a few cloud servers over at Digital Ocean, created a 1G swap file on both as the default memory sucks. Still the same issue… Upon reboot, it takes for ever to boot the first node, it fails, second node fails as well. I am running Cent OS 6.6 and the latest Percona installed via yum repositories.

Attached are the logs and my.cnf files for both nodes (I am only using 2 instead of 3 nodes). I am trying to find a viable easy to use, reliable solution for syncing DBs. Percona is definitely easy to use, takes seconds to install, test, and use, works great right out of the box, but definitelly unreliable if I can’t reboot my servers… I mean if there is a crash or something, I need to know my servers can come back online… Any help would be much appreciated.

Note the second node didntt have an error file, but it had more xtrabackup_* files, the log ins included in note2.zip.
FIrst node was bootstraped as per the instructions in the previously mentioned tutorial from your site.

Regards,
Koster

node1.zip (9.22 KB)

node2.zip (4.54 KB)

Note, after killing all MySQL processes and removing the lock file, I can try and start it again, but it fails just like it does at boot. Its not loading something, or something is missing from the config files.

Ok, i know the issue, its not bootstrapping on reboot, how do i force mysql to boot with bootstrap-pdc option?

SOLVED! In order to make it boot properly, youneed to bootstrap. In in order for this to work in my specific case, as the primary node will be the onky node receiving changes, you need to delete the mysql (in my case S63mysql) entry from your rc.d/rc#.d directory (which ever is defualt, in CentoOS its rc3.d) then edit the /etc/rc.d/rc.local and add /etc/init.d/mysql bootstap-pxc on a line of its own… Now it boots flawlsssly

Problems
Starting MySQL (Percona XtraDB Cluster) database server mysqld
The server quit without updating PID file [fail]