Running Docker pxc container joiner node always fails

I am trying to set this up without a discovery service, just two nodes connected across the network. Can someone check my work and let me know if you have any ideas on what is going wrong? I do not have an innodb logs (I see those requested every time someone has an issue), and the mysql.log on the donor is all but empty. Not sure if the container is setup to save the logs elsewhere, but I have spent a good amount of time in that container looking for them.

Docker Run command for Donor:

docker run -d --name="xtradb-cluster-master" --restart=unless-stopped -v /data/xtradb:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=secret -e CLUSTER_NAME=XtraDBCluster -p 3306:3306 -p 4567-4568:4567-4568 percona/percona-xtradb-cluster

Docker Run command for joiner:

docker run -i -t --name="xtradb-cluster-joiner" --restart=unless-stopped -v /data/xtradb:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=secret -e CLUSTER_NAME=XtraDBCluster -e CLUSTER_JOIN=10.21.1.34 -p 3306:3306 -p 4567-4568:4567-4568 percona/percona-xtradb-cluster

I run the joiner with the -i -t so I can see the logs. Here are the logs for that joiner:

Log coming in the next post because of the limit on characters used.

2017-03-20T13:55:07.067549Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-03-20T13:55:07.069023Z 0 [Note] mysqld (mysqld 5.7.17-11-57) starting as process 1 ...
2017-03-20T13:55:07.071533Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-03-20T13:55:07.071549Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-03-20T13:55:07.083303Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info&#64;codership.com> loaded successfully.
2017-03-20T13:55:07.083892Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-03-20T13:55:07.084305Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 1
2017-03-20T13:55:07.086212Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.17.0.2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-03-20T13:55:07.099685Z 0 [Note] WSREP: GCache history reset: old(da904b65-0b5a-11e7-99d0-b7fed22124dd:0) -> new(00000000-0000-0000-0000-000000000000:-1)
2017-03-20T13:55:07.100300Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-03-20T13:55:07.100317Z 0 [Note] WSREP: wsrep_sst_grab()
2017-03-20T13:55:07.100322Z 0 [Note] WSREP: Start replication
2017-03-20T13:55:07.100332Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-03-20T13:55:07.100404Z 0 [Note] WSREP: protonet asio version 0
2017-03-20T13:55:07.100503Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-03-20T13:55:07.100558Z 0 [Note] WSREP: backend: asio
2017-03-20T13:55:07.100628Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-03-20T13:55:07.100738Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-03-20T13:55:07.100745Z 0 [Note] WSREP: restore pc from disk failed
2017-03-20T13:55:07.101198Z 0 [Note] WSREP: GMCast version 0
2017-03-20T13:55:07.101377Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-03-20T13:55:07.101385Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-03-20T13:55:07.101750Z 0 [Note] WSREP: EVS version 0
2017-03-20T13:55:07.101843Z 0 [Note] WSREP: gcomm: connecting to group 'XtraDBCluster', peer '10.21.1.34:,10.20.1.35:'
2017-03-20T13:55:07.104095Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') connection established to d354cff5 tcp://172.17.0.1:4567
2017-03-20T13:55:07.122258Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') connection established to e0e3516f tcp://10.21.1.34:4567
2017-03-20T13:55:07.122447Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-03-20T13:55:07.610659Z 0 [Note] WSREP: declaring e0e3516f at tcp://10.21.1.34:4567 stable
2017-03-20T13:55:07.614763Z 0 [Note] WSREP: Node e0e3516f state prim
2017-03-20T13:55:07.618465Z 0 [Note] WSREP: view(view_id(PRIM,d354cff5,294) memb {
d354cff5,0
e0e3516f,0
} joined {
} left {
} partitioned {
})
2017-03-20T13:55:07.618490Z 0 [Note] WSREP: save pc into disk
2017-03-20T13:55:07.618767Z 0 [Note] WSREP: discarding pending addr without UUID: tcp://10.20.1.35:4567
2017-03-20T13:55:07.618778Z 0 [Note] WSREP: discarding pending addr proto entry 0x36ff300
2017-03-20T13:55:08.103116Z 0 [Note] WSREP: gcomm: connected
2017-03-20T13:55:08.103174Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-20T13:55:08.103249Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-03-20T13:55:08.103255Z 0 [Note] WSREP: Opened channel 'XtraDBCluster'
2017-03-20T13:55:08.103426Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2017-03-20T13:55:08.103642Z 0 [Note] WSREP: Waiting for SST to complete.
2017-03-20T13:55:08.104028Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: d3edcee8-0d74-11e7-9760-cb6538dddaa7
2017-03-20T13:55:08.107560Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: d3edcee8-0d74-11e7-9760-cb6538dddaa7
2017-03-20T13:55:08.110874Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d3edcee8-0d74-11e7-9760-cb6538dddaa7 from 0 (dca214d836ed)
2017-03-20T13:55:08.114658Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d3edcee8-0d74-11e7-9760-cb6538dddaa7 from 1 (7c3c607f46b4)
2017-03-20T13:55:08.114671Z 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 15,
members = 1/2 (joined/total),
act_id = 0,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = da904b65-0b5a-11e7-99d0-b7fed22124dd
2017-03-20T13:55:08.114685Z 0 [Note] WSREP: Flow-control interval: [23, 23]
2017-03-20T13:55:08.114690Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
2017-03-20T13:55:08.114765Z 1 [Note] WSREP: State transfer required:
Group state: da904b65-0b5a-11e7-99d0-b7fed22124dd:0
Local state: 00000000-0000-0000-0000-000000000000:-1
2017-03-20T13:55:08.114792Z 1 [Note] WSREP: New cluster view: global state: da904b65-0b5a-11e7-99d0-b7fed22124dd:0, view# 16: Primary, number of nodes: 2, my index: 0, protocol version 3
2017-03-20T13:55:08.114798Z 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-03-20T13:55:08.115077Z 0 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.17.0.2' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '1' '' '
WSREP_SST: [INFO] The xtrabackup version is 2.4.6 (20170320 13:55:08.231)
WSREP_SST: [INFO] Streaming with xbstream (20170320 13:55:08.422)
WSREP_SST: [INFO] Using socat as streamer (20170320 13:55:08.424)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20170320 13:55:08.428)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[&#64;]} ) (20170320 13:55:08.467)
2017-03-20T13:55:08.674571Z 1 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.17.0.2:4444/xtrabackup_sst//1
2017-03-20T13:55:08.674615Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-03-20T13:55:08.674643Z 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-03-20T13:55:08.674653Z 1 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-03-20T13:55:08.674742Z 0 [Note] WSREP: Service thread queue flushed.
2017-03-20T13:55:08.674871Z 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (da904b65-0b5a-11e7-99d0-b7fed22124dd): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():535. IST will be unavailable.
2017-03-20T13:55:08.678612Z 0 [Note] WSREP: Member 0.0 (dca214d836ed) requested state transfer from '*any*'. Selected 1.0 (7c3c607f46b4)(SYNCED) as donor.
2017-03-20T13:55:08.678626Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
2017-03-20T13:55:08.678698Z 1 [Note] WSREP: Requesting state transfer: success, donor: 1
2017-03-20T13:55:08.678713Z 1 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(da904b65-0b5a-11e7-99d0-b7fed22124dd:0)
2017-03-20T13:55:09.492715Z 0 [Warning] WSREP: 1.0 (7c3c607f46b4): State transfer to 0.0 (dca214d836ed) failed: -32 (Broken pipe)
2017-03-20T13:55:09.492742Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():765: Will never receive state. Need to abort.
2017-03-20T13:55:09.492781Z 0 [Note] WSREP: gcomm: terminating thread
2017-03-20T13:55:09.492813Z 0 [Note] WSREP: gcomm: joining thread


2017-03-20T13:55:09.492917Z 0 [Note] WSREP: gcomm: closing backend
2017-03-20T13:55:10.603335Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-03-20T13:55:12.603363Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') connection to peer e0e3516f with addr tcp://10.21.1.34:4567 timed out, no messages seen in PT3S
2017-03-20T13:55:12.603507Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.21.1.34:4567
2017-03-20T13:55:14.103368Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') reconnecting to e0e3516f (tcp://10.21.1.34:4567), attempt 0
2017-03-20T13:55:14.993268Z 0 [Note] WSREP: evs::proto(d354cff5, LEAVING, view_id(REG,d354cff5,294)) suspecting node: e0e3516f
2017-03-20T13:55:14.993301Z 0 [Note] WSREP: evs::proto(d354cff5, LEAVING, view_id(REG,d354cff5,294)) suspected node without join message, declaring inactive
2017-03-20T13:55:14.993342Z 0 [Note] WSREP: view(view_id(NON_PRIM,d354cff5,294) memb {
d354cff5,0
} joined {
} left {
} partitioned {
e0e3516f,0
})
2017-03-20T13:55:14.993386Z 0 [Note] WSREP: view((empty))
2017-03-20T13:55:14.993725Z 0 [Note] WSREP: gcomm: closed
2017-03-20T13:55:14.993745Z 0 [Note] WSREP: mysqld: Terminated.
13:55:14 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=152
thread_count=2
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60215 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
mysqld(my_print_stacktrace+0x2c)[0xebe56c]
mysqld(handle_fatal_signal+0x479)[0x7a4b89]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f07c458e890]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x232)[0x7f07c2515532]
/usr/lib/galera3/libgalera_smm.so(+0x77c2b)[0x7f07b7225c2b]
/usr/lib/galera3/libgalera_smm.so(_Z13gcs_core_recvP8gcs_coreP12gcs_act_rcvdx+0x626)[0x7f07b7360be6]
/usr/lib/galera3/libgalera_smm.so(+0x1b7904)[0x7f07b7365904]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7f07c4587064]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f07c25c762d]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.

YeeP,

can you provide also the log from the donor?

You do not need to run container with “-i -t” to see the log.

To get the log from the container running in background, you can execute “docker logs -f <container_name>”

Vadimtk - thanks for the tip on the logs:

2017-03-20T20:04:06.726406Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-03-20T20:04:06.728941Z 0 [Note] mysqld (mysqld 5.7.17-11-57) starting as process 1 ...
2017-03-20T20:04:06.733653Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-03-20T20:04:06.733686Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-03-20T20:04:06.740248Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info&#64;codership.com> loaded successfully.
2017-03-20T20:04:06.741140Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-03-20T20:04:06.741693Z 0 [Note] WSREP: Found saved state: c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0, safe_to_bootsrap: 0
2017-03-20T20:04:06.743363Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.17.0.2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-03-20T20:04:06.767899Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2017-03-20T20:04:06.767961Z 0 [Note] WSREP: wsrep_sst_grab()
2017-03-20T20:04:06.767969Z 0 [Note] WSREP: Start replication
2017-03-20T20:04:06.767985Z 0 [Note] WSREP: Setting initial position to c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0
2017-03-20T20:04:06.767992Z 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2017-03-20T20:04:06.767998Z 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
2017-03-20T20:04:06.768001Z 0 [ERROR] Aborting
2017-03-20T20:04:06.768009Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-03-20T20:04:06.768019Z 0 [Note] WSREP: Service disconnected.
2017-03-20T20:04:09.768135Z 0 [Note] WSREP: Some threads may fail to exit.
2017-03-20T20:04:09.768199Z 0 [Note] Binlog end
2017-03-20T20:04:09.770835Z 0 [Note] mysqld: Shutdown complete
2017-03-20T20:04:14.965787Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-03-20T20:04:14.968036Z 0 [Note] mysqld (mysqld 5.7.17-11-57) starting as process 1 ...
2017-03-20T20:04:14.971438Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-03-20T20:04:14.971462Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-03-20T20:04:14.976203Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info&#64;codership.com> loaded successfully.
2017-03-20T20:04:14.976477Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-03-20T20:04:14.977023Z 0 [Note] WSREP: Found saved state: c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0, safe_to_bootsrap: 0
2017-03-20T20:04:14.978883Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.17.0.2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-03-20T20:04:15.000769Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2017-03-20T20:04:15.000831Z 0 [Note] WSREP: wsrep_sst_grab()
2017-03-20T20:04:15.000838Z 0 [Note] WSREP: Start replication
2017-03-20T20:04:15.000854Z 0 [Note] WSREP: Setting initial position to c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0
2017-03-20T20:04:15.000862Z 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2017-03-20T20:04:15.000868Z 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
2017-03-20T20:04:15.000872Z 0 [ERROR] Aborting
2017-03-20T20:04:15.000879Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-03-20T20:04:15.000890Z 0 [Note] WSREP: Service disconnected.
2017-03-20T20:04:18.001036Z 0 [Note] WSREP: Some threads may fail to exit.
2017-03-20T20:04:18.001107Z 0 [Note] Binlog end
2017-03-20T20:04:18.003658Z 0 [Note] mysqld: Shutdown complete

vadimtk: this one is a little better I cannot delete the previous post, but I deleted the data dir and started over because the donor was not crashing in the past. (still setting this up). This is a brand new instance run with the same commands, from the donor:

2017-03-20T21:10:44.506085Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') connection established to ae6ea4bf tcp://10.20.1.35:4567
2017-03-20T21:10:44.509764Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-03-20T21:10:45.005918Z 0 [Note] WSREP: declaring ae6ea4bf at tcp://10.20.1.35:4567 stable
2017-03-20T21:10:45.009498Z 0 [Note] WSREP: Node 6673b6dc state prim
2017-03-20T21:10:45.012880Z 0 [Note] WSREP: view(view_id(PRIM,6673b6dc,12) memb {
6673b6dc,0
ae6ea4bf,0
} joined {
} left {
} partitioned {
})
2017-03-20T21:10:45.012910Z 0 [Note] WSREP: save pc into disk
2017-03-20T21:10:45.013390Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2017-03-20T21:10:45.013884Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: aebdc4ca-0db1-11e7-9c0f-db7e1711d337
2017-03-20T21:10:45.017229Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: aebdc4ca-0db1-11e7-9c0f-db7e1711d337
2017-03-20T21:10:45.020509Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: aebdc4ca-0db1-11e7-9c0f-db7e1711d337 from 0 (c8e314fdc5d4)
2017-03-20T21:10:45.504119Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: aebdc4ca-0db1-11e7-9c0f-db7e1711d337 from 1 (92495e38bf2d)
2017-03-20T21:10:45.504162Z 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 11,
members = 1/2 (joined/total),
act_id = 14,
last_appl. = 0,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 581a154c-0db1-11e7-9a69-ff24de2d16d2
2017-03-20T21:10:45.504175Z 0 [Note] WSREP: Flow-control interval: [23, 23]
2017-03-20T21:10:45.504421Z 4 [Note] WSREP: New cluster view: global state: 581a154c-0db1-11e7-9a69-ff24de2d16d2:14, view# 12: Primary, number of nodes: 2, my index: 0, protocol version 3
2017-03-20T21:10:45.504444Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-03-20T21:10:45.504476Z 4 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-03-20T21:10:45.504488Z 4 [Note] WSREP: Assign initial position for certification: 14, protocol version: 3
2017-03-20T21:10:45.504511Z 0 [Note] WSREP: Service thread queue flushed.
2017-03-20T21:10:45.993086Z 0 [Note] WSREP: Member 1.0 (92495e38bf2d) requested state transfer from '*any*'. Selected 0.0 (c8e314fdc5d4)(SYNCED) as donor.
2017-03-20T21:10:45.993133Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 14)
2017-03-20T21:10:45.993306Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-03-20T21:10:45.993431Z 0 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14''
2017-03-20T21:10:45.994039Z 4 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] The xtrabackup version is 2.4.6 (20170320 21:10:46.038)
WSREP_SST: [INFO] Streaming with xbstream (20170320 21:10:46.229)
WSREP_SST: [INFO] Using socat as streamer (20170320 21:10:46.232)
WSREP_SST: [INFO] Using /tmp/tmp.eS5cYXKuIu as innobackupex temporary directory (20170320 21:10:46.245)
WSREP_SST: [INFO] Streaming GTID file before SST (20170320 21:10:46.250)
WSREP_SST: [INFO] Evaluating xbstream -c ${FILE_TO_STREAM} | socat -u stdio TCP:172.17.0.2:4444; RC=( ${PIPESTATUS[&#64;]} ) (20170320 21:10:46.252)
2017/03/20 21:10:46 socat[2225] E connect(6, AF=2 172.17.0.2:4444, 16): Connection refused
WSREP_SST: [ERROR] Error while sending data to joiner node: exit codes: 141 1 (20170320 21:10:46.258)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20170320 21:10:46.260)
WSREP_SST: [INFO] Cleaning up temporary directories (20170320 21:10:46.263)
2017-03-20T21:10:46.269260Z 0 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14'
2017-03-20T21:10:46.269308Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14': 32 (Broken pipe)
2017-03-20T21:10:46.269393Z 0 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14'
2017-03-20T21:10:46.273444Z 0 [Warning] WSREP: 0.0 (c8e314fdc5d4): State transfer to 1.0 (92495e38bf2d) failed: -32 (Broken pipe)
2017-03-20T21:10:46.273473Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 14)
2017-03-20T21:10:46.276951Z 0 [Note] WSREP: Member 0.0 (c8e314fdc5d4) synced with group.
2017-03-20T21:10:46.276965Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 14)
2017-03-20T21:10:46.277035Z 4 [Note] WSREP: Synchronized with group, ready for connections
2017-03-20T21:10:46.277050Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-03-20T21:10:47.278086Z 0 [Note] WSREP: forgetting ae6ea4bf (tcp://10.20.1.35:4567)
2017-03-20T21:10:47.278171Z 0 [Note] WSREP: Node 6673b6dc state prim
2017-03-20T21:10:47.278221Z 0 [Note] WSREP: view(view_id(PRIM,6673b6dc,13) memb {
6673b6dc,0
} joined {
} left {
} partitioned {
ae6ea4bf,0
})
2017-03-20T21:10:47.278237Z 0 [Note] WSREP: save pc into disk
2017-03-20T21:10:47.278757Z 0 [Note] WSREP: forgetting ae6ea4bf (tcp://10.20.1.35:4567)
2017-03-20T21:10:47.278776Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-03-20T21:10:47.279500Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b0176ff9-0db1-11e7-895f-d653150a4e73
2017-03-20T21:10:47.279531Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: b0176ff9-0db1-11e7-895f-d653150a4e73
2017-03-20T21:10:47.279540Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: b0176ff9-0db1-11e7-895f-d653150a4e73 from 0 (c8e314fdc5d4)
2017-03-20T21:10:47.279551Z 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 12,
members = 1/1 (joined/total),
act_id = 14,
last_appl. = 0,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 581a154c-0db1-11e7-9a69-ff24de2d16d2
2017-03-20T21:10:47.279558Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-03-20T21:10:47.279714Z 1 [Note] WSREP: New cluster view: global state: 581a154c-0db1-11e7-9a69-ff24de2d16d2:14, view# 13: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-03-20T21:10:47.279743Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-03-20T21:10:47.279756Z 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-03-20T21:10:47.279766Z 1 [Note] WSREP: Assign initial position for certification: 14, protocol version: 3
2017-03-20T21:10:47.279787Z 0 [Note] WSREP: Service thread queue flushed.
2017-03-20T21:10:47.761362Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-03-20T21:10:51.006660Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') connection established to ae6ea4bf tcp://10.20.1.35:4567
2017-03-20T21:10:51.006697Z 0 [Warning] WSREP: discarding established (time wait) ae6ea4bf (tcp://10.20.1.35:4567)
2017-03-20T21:10:52.762135Z 0 [Note] WSREP: cleaning up ae6ea4bf (tcp://10.20.1.35:4567)

It looks to me like Percona provides the config file(s) inside of the container that define the wsrep process settings and I would not need to mount a volume with my own settings unless I wanted to change that. That being said, I just got a vanilla mariadb 10.1 cluster to stand up with the following settings. If anyone can tell me what the difference is and what I can do to get the xtradb cluster working, I would much prefer it. Something I did notice is mariadb has you putting in the ip address of the node that the docker container is in and running on. the percona xtradb-cluster container specifies the CLUSTER_JOIN variable where (as I understand it) you define the address of the other nodes in the cluster to join.

Node 1(10.21.1.34) Docker Run command:

docker run \
--name mariadb-0 \
-d \
-v /root/mariadb:/etc/mysql/conf.d \
-v /data/mariadb:/var/lib/mysql \
-e MYSQL_INITDB_SKIP_TZINFO=yes \
-e MYSQL_ROOT_PASSWORD=secret_pw \
-p 3306:3306 \
-p 4567:4567/udp \
-p 4567-4568:4567-4568 \
-p 4444:4444 \
mariadb:10.1 \
--wsrep-new-cluster \
--wsrep_node_address=10.21.1.34

Node 2(10.20.1.35) Docker Run command:

docker run \
--name mariadb-1 \
-d \
-v /root/mariadb:/etc/mysql/conf.d \
-v /data/mariadb:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=secret_pw \
-p 3306:3306 \
-p 4567:4567/udp \
-p 4567-4568:4567-4568 \
-p 4444:4444 \
mariadb:10.1 \
--wsrep_node_address=10.20.1.35

Config file that was brought it (/root/mariadb/mysql_server.cnf)

#
# Galera Cluster: mandatory settings
#

[server]
bind-address=0.0.0.0
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0

[galera]
wsrep_on=ON
wsrep_provider="/usr/lib/galera/libgalera_smm.so"
wsrep_cluster_address="gcomm://10.21.1.34,10.20.1.35"
wsrep-sst-method=rsync

#
# Optional setting
#

# Tune this value for your system, roughly 2x cores; see https://mariadb.com/kb/en/mariadb/galera-cluster-system-variables/#wsrep_slave_threads
# wsrep_slave_threads=1

# innodb_flush_log_at_trx_commit=0

For comparison and future reference, here are my run commands using the xtradb-cluster docker container (currently not working). The Joiner log is in post #2 in this thread, the Donor log is post #6.

Node 1(10.21.1.34) Docker Run command:

docker run \
-d \
--name="xtradb-cluster-master" \
--restart=unless-stopped \
-v /data/xtradb:/var/lib/mysql \
-p 3306:3306 \
-p 4567:4567/udp \
-p 4567-4568:4567-4568 \
-p 4444:4444 \
-e MYSQL_ROOT_PASSWORD=secret_pw \
-e CLUSTER_NAME=XtraDBCluster \
percona/percona-xtradb-cluster 

Node 2(10.20.1.35) Docker Run command:

docker run \
-d \
--name="xtradb-cluster-joiner" \
--restart=unless-stopped \
-v /data/xtradb:/var/lib/mysql \
-p 3306:3306 \
-p 4567:4567/udp \
-p 4567-4568:4567-4568 \
-p 4444:4444 \
-e MYSQL_ROOT_PASSWORD=secret_pw \
-e CLUSTER_NAME=XtraDBCluster \
-e CLUSTER_JOIN=10.21.1.34 \
percona/percona-xtradb-cluster

On the donor you can see
2017-03-20T20:04:06.767992Z 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2017-03-20T20:04:06.767998Z 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
2017-03-20T20:04:06.768001Z 0 [ERROR] Aborting

so please do what the error suggests: edit the grastate.dat file manually and set safe_to_bootstrap to 1 you will need to shutdown donor and edit the file and start donor.