SSL problems with xtradb-cluster-full-57 and Ubuntu 16.04 OpenSSL 1.0.2

<< More information about the issue is in later post >>

the bootstrapping node remains error free but the connecting node fails producing the following error log:

2018-01-19T01:15:25.005952Z mysqld_safe mysqld from pid file /var/lib/mysql/ubuntu-nyc1-001.pid ended
2018-01-19T01:15:29.919179Z mysqld_safe Logging to ‘/var/log/mysql/error.log’.
2018-01-19T01:15:29.932832Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2018-01-19T01:15:29.940328Z mysqld_safe Skipping wsrep-recover for e0b74010-fbf6-11e7-91b2-07237a0a9432:3 pair
2018-01-19T01:15:29.941208Z mysqld_safe Assigning e0b74010-fbf6-11e7-91b2-07237a0a9432:3 to wsrep_start_position
2018-01-19T01:15:30.129908Z 0 [Warning] option ‘log_warnings’: unsigned value 3 adjusted to 2
2018-01-19T01:15:30.129980Z 0 [Warning] The syntax ‘–log_warnings/-W’ is deprecated and will be removed in a future release. Please use ‘–log_error_verbosity’ instead.
2018-01-19T01:15:30.130043Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2018-01-19T01:15:30.131409Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57) starting as process 27379 …
2018-01-19T01:15:30.133498Z 0 [Note] WSREP: Skipping automatic SSL certificate generation (enabled only in bootstrap mode)
2018-01-19T01:15:30.133710Z 0 [Note] WSREP: Setting wsrep_ready to false
2018-01-19T01:15:30.133724Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2018-01-19T01:15:30.133729Z 0 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib/libgalera_smm.so’
2018-01-19T01:15:30.136381Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <info@codership.com> loaded successfully.
2018-01-19T01:15:30.136449Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2018-01-19T01:15:30.137240Z 0 [Note] WSREP: Found saved state: e0b74010-fbf6-11e7-91b2-07237a0a9432:3, safe_to_bootsrap: 0
2018-01-19T01:15:30.138628Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.70.62; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992; socket.ssl_ca = /etc/mysql/certs/ca.pem; socket.ssl_cert = /etc/mysql/certs/server-cert.pem; socket.ssl_cipher = AES128-SHA; socket.ssl_compression = YES; socket.ssl_key = /etc/mysql/certs/server-key.pem;
2018-01-19T01:15:30.152884Z 0 [Note] WSREP: GCache history reset: e0b74010-fbf6-11e7-91b2-07237a0a9432:0 -> e0b74010-fbf6-11e7-91b2-07237a0a9432:3
2018-01-19T01:15:30.153408Z 0 [Note] WSREP: Assign initial position for certification: 3, protocol version: -1
2018-01-19T01:15:30.153428Z 0 [Note] WSREP: Preparing to initiate SST/IST
2018-01-19T01:15:30.153432Z 0 [Note] WSREP: Starting replication
2018-01-19T01:15:30.153442Z 0 [Note] WSREP: Setting initial position to e0b74010-fbf6-11e7-91b2-07237a0a9432:3
2018-01-19T01:15:30.153623Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2018-01-19T01:15:30.153657Z 0 [Note] WSREP: initializing ssl context
2018-01-19T01:15:30.153960Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2018-01-19T01:15:30.154054Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2018-01-19T01:15:30.154062Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2018-01-19T01:15:30.154561Z 0 [Note] WSREP: GMCast version 0
2018-01-19T01:15:30.154770Z 0 [Note] WSREP: (3d5867c9, ‘ssl://0.0.0.0:4567’) listening at ssl://0.0.0.0:4567
2018-01-19T01:15:30.154778Z 0 [Note] WSREP: (3d5867c9, ‘ssl://0.0.0.0:4567’) multicast: , ttl: 1
2018-01-19T01:15:30.155125Z 0 [Note] WSREP: EVS version 0
2018-01-19T01:15:30.155211Z 0 [Note] WSREP: gcomm: connecting to group ‘my_ubuntu_cluster’, peer ‘192.168.70.61:,192.168.70.62:,192.168.70.63:’
2018-01-19T01:15:30.159654Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.70.62:53024 local endpoint ssl://192.168.70.62:4567 cipher: AES128-SHA compression:
2018-01-19T01:15:30.160107Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.70.62:4567 local endpoint ssl://192.168.70.62:53024 cipher: AES128-SHA compression:
2018-01-19T01:15:30.160284Z 0 [Note] WSREP: (3d5867c9, ‘ssl://0.0.0.0:4567’) connection established to 3d5867c9 ssl://192.168.70.62:4567
2018-01-19T01:15:30.160298Z 0 [Warning] WSREP: (3d5867c9, ‘ssl://0.0.0.0:4567’) address ‘ssl://192.168.70.62:4567’ points to own listening address, blacklisting
2018-01-19T01:15:30.294939Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.70.61:4567 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-01-19T01:15:31.795678Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.70.61:4567 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-01-19T01:15:33.157074Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2018-01-19T01:15:33.157113Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,3d5867c9,1)
memb {
3d5867c9,0
}
joined {
}
left {
}
partitioned {
}
)
2018-01-19T01:15:33.283211Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.70.61:4567 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-01-19T01:15:33.656436Z 0 [Note] WSREP: (3d5867c9, ‘ssl://0.0.0.0:4567’) connection to peer 3d5867c9 with addr ssl://192.168.70.62:4567 timed out, no messages seen in PT3S
2018-01-19T01:15:33.657560Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50245S), skipping check
2018-01-19T01:15:34.796137Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.70.61:4567 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-01-19T01:15:36.310567Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.70.61:4567 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-01-19T01:15:37.811054Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.70.61:4567 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)

Has anyone seen this before?
After some googling i thought it might be a dhparams issue but regenerating /var/lib/mysql/dhparams.pem didn’t help
I am starting to run out of ideas. any help would be greatly appreciated

hello everyone,

I am having a problem getting xtradb to work at all with SSL between the nodes…
following the docs here https://www.percona.com/doc/percona-…l/apt.html#apt
andhttps://www.percona.com/doc/percona-…ntu_howto.html
using these install steps: [INDENT]wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb
sudo dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb
sudo apt-get update
sudo apt-cache search percona
apt-get remove apparmor
apt install percona-xtradb-cluster-full-57
sudo service mysql stop
ufw allow 22,3306,4444,4567,4568/tcp
ufw enable[/INDENT]

copy the supplied /etc/mysql/my.cnf templates
the cluster boots and runs no problem…

Then I generated keys as per
https://www.percona.com/doc/percona-…ate-keys-certs [INDENT]$ openssl genrsa 2048 > ca-key.pem
$ openssl req -new -x509 -nodes -days 3600 -key ca-key.pem -out ca.pem
$ openssl req -newkey rsa:2048 -days 3600 -nodes -keyout server-key.pem -out server-req.pem
$ openssl rsa -in server-key.pem -out server-key.pem
$ openssl x509 -req -in server-req.pem -days 3600 -CA ca.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem
$ openssl req -newkey rsa:2048 -days 3600 -nodes -keyout client-key.pem -out client-req.pem
$ openssl rsa -in client-key.pem -out client-key.pem
$ openssl x509 -req -in client-req.pem -days 3600 -CA ca.pem -CAkey ca-key.pem -set_serial 01 -out client-cert.pem[/INDENT]

and all goes as expected

updating the provided my.cnf example: [INDENT][mysqld]
[COLOR=#FF0000]ssl-ca=/etc/mysql/certs/ca.pem
ssl-cert=/etc/mysql/certs/server-cert.pem
ssl-key=/etc/mysql/certs/server-key.pem
datadir=/var/lib/mysql user=mysql

Path to Galera library

wsrep_provider=/usr/lib/libgalera_smm.so

Cluster connection URL contains the IPs of node#1, node#2 and node#3 wsrep_cluster_address=gcomm://192.168.70.61,192.168.70.62,192.168.70.63

[COLOR=#FF0000]wsrep_provider_options=“socket.ssl=yes;socket.ssl_ ca=/etc/mysql/certs/ca.pem;socket.ssl_cert=/etc/mysql/certs/server-cert.pem;socket.ssl_key=/etc/mysql/certs/server-key.pem”
early-plugin-load=keyring_file.so
keyring-file-data=/etc/mysql/keyring

In order for Galera to work correctly binlog format should be ROW

binlog_format=ROW

MyISAM storage engine has only experimental support

default_storage_engine=InnoDB

This InnoDB autoincrement locking mode is a requirement for Galera

innodb_autoinc_lock_mode=2

Node #1 address

wsrep_node_address=192.168.70.61

SST method

wsrep_sst_method=xtrabackup-v2

Cluster name

wsrep_cluster_name=my_ubuntu_cluster

Authentication for SST method

wsrep_sst_auth=“sstuser:s3cretPass”
[COLOR=#FF0000][client][/INDENT]
[INDENT][COLOR=#FF0000]ssl-ca=/etc/mysql/certs/ca.pem
ssl-cert=/etc/mysql/certs/client-cert.pem
ssl-key=/etc/mysql/certs/client-key.pem
[sst]
encrypt=4
ssl-ca=/etc/mysql/certs/ca.pem
ssl-cert=/etc/mysql/certs/server-cert.pem
ssl-key=/etc/mysql/certs/server-key.pem[/INDENT]

This prevents the cluster from bootstrapping…adding [INDENT][COLOR=#FF0000]log_error = /var/log/mysql/error.log
log_warnings = 3
log_error_verbosity = 3[/INDENT]
to the [mysqld] block does not produce a log when the bootstrap fails…

Further, if comment out the [client] block [INDENT][COLOR=#FF0000]#[client]
#ssl-ca=/etc/mysql/certs/ca.pem
#ssl-cert=/etc/mysql/certs/client-cert.pem
#ssl-key=/etc/mysql/certs/client-key.pem[/INDENT]
the first node will bootstrap but but the second node cannot connect.

posts 1 and 2 are inverted I didn’t realize there would be a moderation delay.

chrisg931 , Can you run the following command and show the output?

openssl verify -CAfile /etc/mysql/certs/ca.pem /etc/mysql/certs/server-cert.pem

It should show just this
server-cert.pem: OK

If there’s additional text, then you may have to set the common name field when creating the ca certificate file

$ openssl req -new -x509 -nodes -days 3600 -key ca-key.pem -out ca.pem

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter ‘.’, the field will be left blank.

Country Name (2 letter code) [AU]:
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:
Organization Name (eg, company) [Internet Widgits Pty Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []: << ENTER SOME NAME HERE >>
Email Address []:

Hey kenn,

Thanks for the input.
it seems like if this is a certificate issue openssl doesn’t catch it…

$ openssl verify -CAfile /etc/mysql/certs/ca.pem /etc/mysql/certs/server-cert.pem
just returns
/etc/mysql/certs/server-cert.pem: OK

Any other ideas? I have tried recreating the certs several times to no avail.

has anyone been able to successfully implement ssl with their cluster?

You can check the link on how to set SSL for PXC.

https://www.percona.com/doc/percona-xtradb-cluster/LATEST/security/encrypt-traffic.html

It has all the steps including automated simplified option and manual way of configuring.

I followed the steps in your link but it did not work. please see the error logs and repeatable steps from above.
do you have any other ideas or a known good environment this can be used with?
clearly it doesn’t work as expected with ubuntu 16.04…

It seems like the joining node is unable to find the primary node and so it fails

2018-01-19T01:15:33.157074Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2018-01-19T01:15:33.157113Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,3d5867c9,1)
memb {
3d5867c9,0
}
joined {
}
left {
}
partitioned {
}
)

Wondering if you are restarting the cluster. Also, make sure the first node is bootstrapped and is in a PRIMARY state.

BTW here is reference to one such error which again suggest restart of the node sequence issue
http://msutic.blogspot.in/2017/10/enable-ssl-encryption-for-mariadb.html

from my initial post

considering the bootstrap fails when certs are provided restarting isn’t an option
again I am okay switching environments if anyone can give me an example of a known good os/openssl/db version combo wondering what the docs were tested with?
something else seems to be going on here and with the lack of logged errors i cant seem to figure out what…

I could reproduce the said problem. Here is my joiner error log

2018-02-08T05:30:40.057323Z 0 [Note] WSREP: GMCast version 0
2018-02-08T05:30:40.057533Z 0 [Note] WSREP: (33052bdf, ‘ssl://127.0.0.1:5030’) listening at ssl://127.0.0.1:5030
2018-02-08T05:30:40.057544Z 0 [Note] WSREP: (33052bdf, ‘ssl://127.0.0.1:5030’) multicast: , ttl: 1
2018-02-08T05:30:40.057955Z 0 [Note] WSREP: EVS version 0
2018-02-08T05:30:40.058091Z 0 [Note] WSREP: gcomm: connecting to group ‘pxc-cluster’, peer ‘127.0.0.1:5030,127.0.0.1:4030’
2018-02-08T05:30:40.059232Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-02-08T05:30:41.560531Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-02-08T05:30:43.060244Z 0 [Note] WSREP: announce period timed out (pc.announce_timeout)
2018-02-08T05:30:43.060685Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2018-02-08T05:30:43.060725Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,33052bdf,1)
memb {
33052bdf,0
}
joined {
}
left {
}
partitioned {
}
)
2018-02-08T05:30:43.060860Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-02-08T05:30:43.560950Z 0 [Warning] WSREP: last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50302S), skipping check
2018-02-08T05:30:44.560865Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-02-08T05:30:46.061237Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-02-08T05:30:47.562425Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
2018-02-08T05:30:49.063324Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)
^C2018-02-08T05:30:50.562697Z 0 [ERROR] WSREP: handshake with remote endpoint ssl://127.0.0.1:4030 failed: asio.ssl:336031996: ‘unknown protocol’ ( 336031996: ‘error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol’)

Looks quite similar to what you have been facing.


What wrong did I do to get this issue ?

  • My DONOR node is not configured to use SSL but JOINER node is configured to use SSL.
    As simple as adding pxc-encrypt-cluster-traffic=on to only JOINER my.cnf and I hit the issue.

I presume you can re-check your configuration and ensure that configuration of all the nodes (in all respect except resource limits like buffer pool size, log-file-size etc…) should be same unless there is a reason to have them different.

I tried reproducing the scenario with generated certificates.

  1. I presume you have noticed this note “The Common Name value used for the server and client keys and certificates must differ from that value used for the CA certificate.”
  2. Following command has returned OK (without any error). “openssl verify -CAfile ca.pem server-cert.pem client-cert.pem”
  3. Please enable encryption for replication traffic too. Check this section “Encrypting Replication Traffic” 4. Enable the SST replication as documented “Encrypting SST Traffic” Either of the step is missing. I could get things working with the said documented step. We are also working on the making the document more easy to grasp.

Forgot to mention that “Please enable encryption for replication traffic too. Check this section “Encrypting Replication Traffic”” is needed only if all the nodes are using it. If you have not set it on DONOR node then JOINER doesn’t need it too.
Also, cert files on all the node should be same.

I did notice this and create the certs accordingly…

correct…

I found the docs easy to follow I just didn’t get the expected result.
I can leave the sst and wsrep certs in place and still bootstrap.
these lines make it fail to bootstrap on the initial node

[mysqld]
[COLOR=#FF0000]ssl-ca=/etc/mysql/certs/ca.pem
ssl-cert=/etc/mysql/certs/server-cert.pem
ssl-key=/etc/mysql/certs/server-key.pem

If you are still facing the issue I would request you to remove all the existing ssl settings and just introduce pxc-encrypt-cluster-traffic=ON and check if things works. This will help get the baseline where the problem could be.

I also tried that before the opening the thread but the results were the same.
Have you been able to reproduce this or can you confirm what environment it does work with?

chrisg931 Hmmm… I’ve been able to get this working on Ubuntu Xenial 16.04 with PXC 5.7.20, using the configuration you sent in the original post This is pretty strange. From the error, it looks like some kind of SSL version mismatch between the two nodes. Can you attach the config files and pem files (ca.pem, server-key.pem and server-cert.pem)? That way we can determine if it’s a system configuration issue or a mysql configuration issue.