How to debug a WSREP problem keeping PXC from starting up

Hello,

I recently decided to start using Percona Cluster to power some site databases with multi master replication. I have followed a few tutorials and tried to eliminate security (disabled selinux and iptables rules) but I always get the same problem.

I can start up the first node in my cluster just fine when the wsrep_cluster_name variable is set to gcomm://. Everything seems to be running smoothly. I have a xinetd mysqlchk daemon that runs clustercheck. On the second and third servers when the wsrep_cluster_name is set to gcomm:// the server fails to start.

The only log sequence I appear to be left with is the following in /var/log/syslog:

Jan 2 03:40:44 precise64 /etc/init.d/mysql[8154]: 0 processes alive and ‘/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping’ resulted in
Jan 2 03:40:44 precise64 /etc/init.d/mysql[8154]: #007/usr/bin/mysqladmin: connect to server at ‘localhost’ failed
Jan 2 03:40:44 precise64 /etc/init.d/mysql[8154]: error: ‘Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’ (111)’
Jan 2 03:40:44 precise64 /etc/init.d/mysql[8154]: Check that mysqld is running and that the socket: ‘/var/run/mysqld/mysqld.sock’ exists!
Jan 2 03:40:44 precise64 /etc/init.d/mysql[8154]:

Does anyone know what I can try to get to the bottom of this issue? Are there any valuable debugging options that I can employ?

Any help is much appreciated!

Thanks,
Adrian

Do you mean wsrep_cluster_name or wsrep_cluster_address

I believe wsrep_cluster_name must be unique for each server.

I believe it is wsrep_cluster_address that contain gcomm:// or the actual cluster addresses used for initial cluster communications