I need some help before I pull the last remaining hair off my head.
Took the files from a running MariaDB 10.1 Cluster (two nodes, 1 galera arbitrator), and used them to set up a new cluster with SSL enabled.
Debian 8.6 and the latest stable package from MariaDB’s Debian repo (10.1.17+maria-1~jessie) in case that matters (both old and new clusters).
Followed the one or so guide I could find (nobody uses SSL???) and of course I have changed everything related to the new cluster (IP addresses, names of hosts, passwords, etc). I have also allowed traffic through the firewalls on all three servers, and the firewall does not show any other logs related to it blocking any traffic that has anything to do with all this.
Bootstrapping the cluster works as expected. Starting the galera arbitrator also works as expected, with logs showing SSL connections.
Starting the 2nd node in the cluster results in a split brain, with a failure to read from the donor (SST):
On JOINER:
Sep 30 13:04:36 1 mysqld[8205]: 2016-09-30 13:04:36 140266241517312 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 0)
Sep 30 13:04:36 1 mysqld[8205]: 2016-09-30 13:04:36 140266751359744 [Note] WSREP: Requesting state transfer: success, donor: 2
Sep 30 13:04:38 1 mysqld[8205]: 2016-09-30 13:04:38 140266249910016 [Note] WSREP: (4a57971f, ‘ssl://0.0.0.0:4567’) turning message relay requesting off
Sep 30 13:06:05 1 systemd[1]: mariadb.service start operation timed out. Terminating.
Sep 30 13:06:16 1 -wsrep-sst-joiner: Possible timeout in receving first data from donor in gtid stage
Sep 30 13:06:16 1 -wsrep-sst-joiner: Cleanup after exit with status:32
Sep 30 13:06:16 1 mysqld[8205]: 2016-09-30 13:06:16 140266212161280 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘2.2.2.2’ --datadir ‘/var/lib/mysql/’ --parent ‘8205’ ‘’ : 32 (Broken pipe)
Sep 30 13:06:16 1 mysqld[8205]: 2016-09-30 13:06:16 140266212161280 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
Sep 30 13:06:16 1 mysqld[8205]: 2016-09-30 13:06:16 140266751674304 [ERROR] WSREP: SST failed: 32 (Broken pipe)
Sep 30 13:06:16 1 mysqld[8205]: 2016-09-30 13:06:16 140266751674304 [ERROR] Aborting
Sep 30 13:06:24 1 mysqld[8205]: Error in my_thread_global_end(): 1 threads didn’t exit
Sep 30 13:06:24 1 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
On DONOR:
Sep 30 13:04:38 1 mysqld[7291]: WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20160930 13:04:38.516)
Sep 30 13:04:38 1 -wsrep-sst-donor: Streaming with xbstream
Sep 30 13:04:38 1 -wsrep-sst-donor: Using socat as streamer
Sep 30 13:04:38 1 -wsrep-sst-donor: Using openssl based encryption with socat: with key and crt
Sep 30 13:04:38 1 -wsrep-sst-donor: Encrypting with cert=/changed/to/protect/the/innocent/mariadb-cluster-server.pem, key=/changed/to/protect/the/innocent/mariadb-cluster-server.key, cafile=/changed/to/protect/the/innocent/ca.pem
Sep 30 13:04:38 1 mysqld[7291]: 2016-09-30 13:04:38 140600498190080 [Note] WSREP: (55cd54fd, ‘ssl://0.0.0.0:4567’) turning message relay requesting off
Sep 30 13:04:38 1 -wsrep-sst-donor: Using /tmp/tmp.KDynCi46tN as innobackupex temporary directory
Sep 30 13:04:38 1 -wsrep-sst-donor: Streaming GTID file before SST
Sep 30 13:04:38 1 -wsrep-sst-donor: Evaluating xbstream -c ${INFO_FILE} | socat -u stdio openssl-connect:protected.example.com:4444,cert=/changed/to/protect/the/innocent/mariadb-cluster-server.pem,key=/changed/to/protect/the/innocent/mariadb-cluster-server.key,cafile=/changed/to/protect/the/innocent/ca.pem; RC=( ${PIPESTATUS[@]} )
Sep 30 13:06:18 1 mysqld[7291]: 2016-09-30 13:06:18 140600498190080 [Note] WSREP: (55cd54fd, ‘ssl://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: ssl://212.224.76.133:4567
Sep 30 13:06:18 1 mysqld[7291]: 2016-09-30 13:06:18 140600498190080 [Note] WSREP: declaring 3708ca1b at ssl://1.1.1.1:4567 stable
Sep 30 13:06:18 1 mysqld[7291]: 2016-09-30 13:06:18 140600498190080 [Note] WSREP: forgetting 4a57971f (ssl://2.2.2.2:4567)
Sep 30 13:06:18 1 mysqld[7291]: 2016-09-30 13:06:18 140600498190080 [Note] WSREP: (55cd54fd, ‘ssl://0.0.0.0:4567’) turning message relay requesting off
Sep 30 13:06:18 1 mysqld[7291]: 2016-09-30 13:06:18 140600498190080 [Note] WSREP: Node 3708ca1b state prim
Sep 30 13:06:18 1 mysqld[7291]: 2016-09-30 13:06:18 140600498190080 [Note] WSREP: view(view_id(PRIM,3708ca1b,12) memb {
Sep 30 13:06:18 1 mysqld[7291]: 3708ca1b,0
Sep 30 13:06:18 1 mysqld[7291]: 55cd54fd,0
Sep 30 13:06:18 1 mysqld[7291]: } joined {
Sep 30 13:06:18 1 mysqld[7291]: } left {
Sep 30 13:06:18 1 mysqld[7291]: } partitioned {
Sep 30 13:06:18 1 mysqld[7291]: 4a57971f,0
Sep 30 13:06:18 1 mysqld[7291]: })
And so on and so forth. The logs have been pruned as all other information isn’t really related to this issue. They have also been sanitized, so don’t ask me if those certificate directories or hosts exist. They do
Using wsrep_sst_method=xtrabackup-v2 (and all the other wsrep_* settings have been verified (twice) to be correct)).
And this:
[sst]
encrypt = 3
Also tried with 2 and 1 but no luck.
Thanks in advance.