First node fails to restart after bootstrap

Hello,

I’m following the instruction to set up a Percona XtraDB Cluster with 3 nodes. After bootstraping the first node, I stop and start the mysql service and it fails to start again. Relevant information below:

/etc/mysql/my.cnf
https://paste.kde.org/p2v9sdfg4

/var/log/mysql/error.log
https://paste.kde.org/ptsjrg0xx

At this point the mysql service is running after the bootstrap process. Now I stop the service with “systemctl stop mysql” and the log continues like this:
https://paste.kde.org/psoixjhhg

At this point the mysql service is stopped. Now I try to start the service with “systemctl start mysql” and it fails. The log continues like this:
https://paste.kde.org/ptnihg5kr

Any idea what could be the problem?

Thanks in advance.

Vincent,

Are there correct addresses?
wsrep_cluster_address=gcomm://pxc-node-1.zone-a.mydomain.com,pxc-node-2.zone-b.mydomain.com,pxc-node-3.zone-b.mydomain.com

you need to use valid IP addresses or hostnames of your nodes.

Yes, of course, I’m using valid hostnames there. I just changed them a little bit to preserve my privacy.

$ ping pxc-node-1
PING pxc-node-1.****.****.org (192.168.1.170) 56(84) bytes of data.
64 bytes from pxc-node-1.****.****.org (192.168.1.170): icmp_seq=1 ttl=64 time=0.487 ms
64 bytes from pxc-node-1.****.****.org (192.168.1.170): icmp_seq=2 ttl=64 time=0.263 ms
64 bytes from pxc-node-1.****.****.org (192.168.1.170): icmp_seq=3 ttl=64 time=0.296 ms
64 bytes from pxc-node-1.****.****.org (192.168.1.170): icmp_seq=4 ttl=64 time=0.285 ms
^C
--- pxc-node-1.****.****.org ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.263/0.332/0.487/0.092 ms

Any idea why this could be happening?

Lines [LIST=1]
[]2017-03-07T15:52:01.532235Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
[
]at gcomm/src/pc.cpp:connect():158
[]2017-03-07T15:52:01.532259Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
[
]2017-03-07T15:52:01.532352Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1407: Failed to open channel ‘pxc-cluster-1’ at ‘gcomm://pxc-node-1.zone-a.mydomain.com,pxc-node-2.zone-b.mydomain.com,pxc-node-3.zone-b.mydomain.com’: -110 (Connection timed out)
[]2017-03-07T15:52:01.532370Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
[
]2017-03-07T15:52:01.532382Z 0 [ERROR] WSREP: wsrep::connect(gcomm://pxc-node-1.zone-a.mydomain.com,pxc-node-2.zone-b.mydomain.com,pxc-node-3.zone-b.mydomain.com) failed: 7
[/LIST] indicate that there is some problem with network.
Please check that nodes are really available and also firewall does not block connections and tcp ports are accessible.
https://www.percona.com/doc/percona-…xtradb-cluster

Vincent,
Pardon me if I’m missing something here, but this is what it appears you are describing doing:

  1. Start node 1 using bootstrap_pxc to create a cluster
  2. Stop node 1 before joining any additional nodes, terminating the cluster
  3. Start node 1 again using ‘start’ instead of ‘bootstrap_pxc’ and expect it to join a cluster that doesn’t have any running nodes

Have you tried joining additional nodes to the cluster BEFORE you stop the bootstrapped node? Do they successfully join?

That cannot be the problem, since there isn’t any firewall stopping connections to those ports. Anyway, to simplify things I just started from scratch and installed Percona XtraDB Cluster in a new Debian machine. The configuration is simple, just this node in the cluster. So, let’s bootstrap and then start this first (and only) node:

root@pxc-node-1:~# /etc/init.d/mysql bootstrap-pxc
[ ok ] Bootstrapping Percona XtraDB Cluster database server: mysqld ..
root@pxc-node-1:~# /etc/init.d/mysql start 
[ ok ] Starting mysql (via systemctl): mysql.service.

Let’s see the listening ports:

root@pxc-node-1:~# netstat -putan | grep mysqld
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 7972/mysqld 
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 7972/mysqld 

The IP address of that machine is 192.168.154.50:

root@pxc-node-1:~# ip addr | grep inet
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
inet [B]192.168.154.50[/B]/24 brd 192.168.154.255 scope global eth0
inet6 fe80::216:3eff:fe45:7abd/64 scope link

From a different machine I check if those ports are open:

$ nmap 192.168.154.50 | grep open
22/tcp open ssh
3306/tcp open mysql
4567/tcp open tram

Now, before restarting the mysql server let’s have a look to the my.cnf file:

root@pxc-node-1:~# cat /etc/mysql/my.cnf
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0

[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
bind-address = 0.0.0.0
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 8
query_cache_limit = 1M
query_cache_size = 16M
log_error = /var/log/mysql/error.log
expire_logs_days = 10
max_binlog_size = 100M
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_name=pxc-cluster-1
wsrep_cluster_address=gcomm://192.168.154.50
wsrep_node_name=pxc-node-1
wsrep_node_address=192.168.154.50
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sstuser:sstpass
pxc_strict_mode=ENFORCING
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

[mysqldump]
quick
quote-names
max_allowed_packet = 16M

[mysql]

[isamchk]

!includedir /etc/mysql/conf.d/

Ok, let’s restart it:

root@pxc-node-1:~# /etc/init.d/mysql restart
[....] Restarting mysql (via systemctl): mysql.serviceJob for mysql.service failed. See 'systemctl status mysql.service' and 'journalctl -xn' for details.
failed!

This is the log:

root@pxc-node-1:~# cat /var/log/mysql/error.log 
2017-03-24T16:37:23.302908Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-03-24T16:37:23.303756Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.17-11-57) starting as process 8764 ...
2017-03-24T16:37:23.306318Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-03-24T16:37:23.306336Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-03-24T16:37:23.309949Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info&#64;codership.com> loaded successfully.
2017-03-24T16:37:23.310016Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-03-24T16:37:23.310370Z 0 [Note] WSREP: Found saved state: 82016f02-10aa-11e7-a0c8-f3e63a8d05f1:3, safe_to_bootsrap: 1
2017-03-24T16:37:23.327889Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.154.50; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-03-24T16:37:23.337329Z 0 [Note] WSREP: GCache history reset: old(82016f02-10aa-11e7-a0c8-f3e63a8d05f1:0) -> new(82016f02-10aa-11e7-a0c8-f3e63a8d05f1:3)
2017-03-24T16:37:23.343982Z 0 [Note] WSREP: Assign initial position for certification: 3, protocol version: -1
2017-03-24T16:37:23.344014Z 0 [Note] WSREP: wsrep_sst_grab()
2017-03-24T16:37:23.344024Z 0 [Note] WSREP: Start replication
2017-03-24T16:37:23.344045Z 0 [Note] WSREP: Setting initial position to 82016f02-10aa-11e7-a0c8-f3e63a8d05f1:3
2017-03-24T16:37:23.344153Z 0 [Note] WSREP: protonet asio version 0
2017-03-24T16:37:23.344324Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-03-24T16:37:23.344400Z 0 [Note] WSREP: backend: asio
2017-03-24T16:37:23.344498Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-03-24T16:37:23.344672Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-03-24T16:37:23.344690Z 0 [Note] WSREP: restore pc from disk failed
2017-03-24T16:37:23.345586Z 0 [Note] WSREP: GMCast version 0
2017-03-24T16:37:23.345944Z 0 [Note] WSREP: (283cbdd2, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-03-24T16:37:23.345967Z 0 [Note] WSREP: (283cbdd2, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-03-24T16:37:23.346540Z 0 [Note] WSREP: EVS version 0
2017-03-24T16:37:23.346654Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster-1', peer '192.168.154.50:'
2017-03-24T16:37:23.347540Z 0 [Note] WSREP: (283cbdd2, 'tcp://0.0.0.0:4567') connection established to 283cbdd2 tcp://192.168.154.50:4567
2017-03-24T16:37:23.347560Z 0 [Warning] WSREP: (283cbdd2, 'tcp://0.0.0.0:4567') address 'tcp://192.168.154.50:4567' points to own listening address, blacklisting
2017-03-24T16:37:26.347492Z 0 [Note] WSREP: (283cbdd2, 'tcp://0.0.0.0:4567') connection to peer 283cbdd2 with addr tcp://192.168.154.50:4567 timed out, no messages seen in PT3S
2017-03-24T16:37:26.347762Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-03-24T16:37:26.347797Z 0 [Note] WSREP: view(view_id(NON_PRIM,283cbdd2,1) memb {
283cbdd2,0
} joined {
} left {
} partitioned {
})
2017-03-24T16:37:26.847913Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50141S), skipping check
2017-03-24T16:37:56.352374Z 0 [Note] WSREP: view((empty))
2017-03-24T16:37:56.352577Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2017-03-24T16:37:56.352998Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-24T16:37:56.353076Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'pxc-cluster-1' at 'gcomm://192.168.154.50': -110 (Connection timed out)
2017-03-24T16:37:56.353092Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-24T16:37:56.353099Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.154.50) failed: 7
2017-03-24T16:37:56.353104Z 0 [ERROR] Aborting

2017-03-24T16:37:56.353111Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-03-24T16:37:56.353120Z 0 [Note] WSREP: Service disconnected.
2017-03-24T16:37:59.353323Z 0 [Note] WSREP: Some threads may fail to exit.
2017-03-24T16:37:59.353371Z 0 [Note] Binlog end
2017-03-24T16:37:59.353470Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

Any idea?

Let’s try it. First check the IP address of the nodes 1 and 2:

root&#64;pxc-node-1:~# ip addr | grep inet
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
inet [B]192.168.154.50[/B]/24 brd 192.168.154.255 scope global eth0
inet6 fe80::216:3eff:fe45:7abd/64 scope link
root&#64;pxc-node-2:~# ip addr | grep inet
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
inet [B]192.168.154.53[/B]/24 brd 192.168.154.255 scope global eth0
inet6 fe80::216:3eff:fe78:5d73/64 scope link

This is the my.cnf file:

root&#64;pxc-node-1:~# cat /etc/mysql/my.cnf
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0

[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
bind-address = 0.0.0.0
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 8
query_cache_limit = 1M
query_cache_size = 16M
log_error = /var/log/mysql/error.log
expire_logs_days = 10
max_binlog_size = 100M
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_name=pxc-cluster-1
wsrep_cluster_address=gcomm://192.168.154.50,192.168.154.53
wsrep_node_name=pxc-node-1
wsrep_node_address=192.168.154.50
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sstuser:sstpass
pxc_strict_mode=ENFORCING
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

[mysqldump]
quick
quote-names
max_allowed_packet = 16M

[mysql]

[isamchk]

!includedir /etc/mysql/conf.d/

Now let’s bootstrap node-1 and check the cluster is synced and ready:

root&#64;pxc-node-1:~# /etc/init.d/mysql bootstrap-pxc
[ ok ] Bootstrapping Percona XtraDB Cluster database server: mysqld ..
root&#64;pxc-node-1:~# systemctl start mysql
root&#64;pxc-node-1:~# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.17-11-57 Percona XtraDB Cluster (GPL), Release rel11, Revision e2a7fdd, WSREP version 27.20, wsrep_27.20

Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show status like 'wsrep%';
+------------------------------+--------------------------------------+
| Variable_name | Value |
+------------------------------+--------------------------------------+
| wsrep_local_state_uuid | 82016f02-10aa-11e7-a0c8-f3e63a8d05f1 |
| wsrep_protocol_version | 7 |
| wsrep_last_committed | 4 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 2 |
| wsrep_received_bytes | 148 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_local_cached_downto | 0 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_interval | [ 16, 16 ] |
| wsrep_cert_deps_distance | 0.000000 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 0.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 0.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_cert_bucket_count | 22 |
| wsrep_gcache_pool_size | 1320 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.000000 |
| wsrep_incoming_addresses | 192.168.154.50:3306 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0/0/0/0/0 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 91c7bee0-10b2-11e7-aea7-4f13680ff88c |
| wsrep_cluster_conf_id | 1 |
| wsrep_cluster_size | 1 |
| wsrep_cluster_state_uuid | 82016f02-10aa-11e7-a0c8-f3e63a8d05f1 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info&#64;codership.com> |
| wsrep_provider_version | 3.20(r7e383f7) |
| wsrep_ready | ON |
+------------------------------+--------------------------------------+
60 rows in set (0.00 sec)

mysql> quit
Bye

Now from node-2, this is the only difference in my.cnf:

wsrep_node_name=pxc-node-2
wsrep_node_address=192.168.154.53

Let’s start mysql and see if it joins the cluster:

root&#64;pxc-node-2:~# /etc/init.d/mysql start
[....] Starting mysql (via systemctl): mysql.serviceJob for mysql.service failed. See 'systemctl status mysql.service' and 'journalctl -xn' for details.
failed!

This is the error.log: https://paste.kde.org/polvcg4ew

And this is the error.log in node-1: https://paste.kde.org/p9m3ledhc

Did you create the SST user on the first node before trying to let node2 join the cluster?

Yes, I didn’t put that information in the post, but I did create the sst user on both sides.

Part of the logs on the donor, the innobackup.backup.log, may contain more information about what failed.

WSREP_SST: [INFO] Evaluating xtrabackup --defaults-file=/etc/mysql/my.cnf --defaults-group=mysqld $tmpopts $INNOEXTRA $keyringbackupopt --backup --galera-info --binlog-info=ON --stream=$sfmt --target-dir=$itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.154.53:4444; RC=( ${PIPESTATUS[@]} ) (20170324 17:23:23.544)

2017-03-24T17:23:23.615309Z 8 [Note] Aborted connection 8 to db: ‘unconnected’ user: ‘sstuser’ host: ‘localhost’ (Got an error reading communication packets)

WSREP_SST: [ERROR] xtrabackup finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20170324 17:23:23.616)

We are missing the most important point, which is the mysql service simply fails to restart after bootstraping, even before continuing with the process of bringing up new nodes.

For instance, a my.cnf config file containing this (among the rest of the stuff, of course)…

bind-address = 127.0.0.1
wsrep_cluster_address = gcomm://127.0.0.1
wsrep_node_address = 127.0.0.1

…also fails to restart with the same problem:

2017-04-03T12:55:55.531777Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-04-03T12:55:55.532706Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.17-11-57) starting as process 2412 ...
2017-04-03T12:55:55.535613Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-04-03T12:55:55.535631Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-04-03T12:55:55.539325Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info&#64;codership.com> loaded successfully.
2017-04-03T12:55:55.539391Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-04-03T12:55:55.539748Z 0 [Note] WSREP: Found saved state: 82016f02-10aa-11e7-a0c8-f3e63a8d05f1:4, safe_to_bootsrap: 1
2017-04-03T12:55:55.577431Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 127.0.0.1; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-04-03T12:55:55.587279Z 0 [Note] WSREP: GCache history reset: old(82016f02-10aa-11e7-a0c8-f3e63a8d05f1:0) -> new(82016f02-10aa-11e7-a0c8-f3e63a8d05f1:4)
2017-04-03T12:55:55.593374Z 0 [Note] WSREP: Assign initial position for certification: 4, protocol version: -1
2017-04-03T12:55:55.593407Z 0 [Note] WSREP: wsrep_sst_grab()
2017-04-03T12:55:55.593416Z 0 [Note] WSREP: Start replication
2017-04-03T12:55:55.593439Z 0 [Note] WSREP: Setting initial position to 82016f02-10aa-11e7-a0c8-f3e63a8d05f1:4
2017-04-03T12:55:55.593518Z 0 [Note] WSREP: protonet asio version 0
2017-04-03T12:55:55.593635Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-04-03T12:55:55.593675Z 0 [Note] WSREP: backend: asio
2017-04-03T12:55:55.593739Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-04-03T12:55:55.593846Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-04-03T12:55:55.593878Z 0 [Note] WSREP: restore pc from disk failed
2017-04-03T12:55:55.594422Z 0 [Note] WSREP: GMCast version 0
2017-04-03T12:55:55.594602Z 0 [Note] WSREP: (e0405f22, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-04-03T12:55:55.594614Z 0 [Note] WSREP: (e0405f22, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-04-03T12:55:55.595036Z 0 [Note] WSREP: EVS version 0
2017-04-03T12:55:55.595156Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster-1', peer '127.0.0.1:'
2017-04-03T12:55:55.596099Z 0 [Note] WSREP: (e0405f22, 'tcp://0.0.0.0:4567') connection established to e0405f22 tcp://127.0.0.1:4567
2017-04-03T12:55:55.596119Z 0 [Warning] WSREP: (e0405f22, 'tcp://0.0.0.0:4567') address 'tcp://127.0.0.1:4567' points to own listening address, blacklisting
2017-04-03T12:55:58.596032Z 0 [Note] WSREP: (e0405f22, 'tcp://0.0.0.0:4567') connection to peer e0405f22 with addr tcp://127.0.0.1:4567 timed out, no messages seen in PT3S
2017-04-03T12:55:58.596148Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-04-03T12:55:58.596170Z 0 [Note] WSREP: view(view_id(NON_PRIM,e0405f22,1) memb {
e0405f22,0
} joined {
} left {
} partitioned {
})
2017-04-03T12:55:59.096286Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50126S), skipping check
2017-04-03T12:56:28.603257Z 0 [Note] WSREP: view((empty))
2017-04-03T12:56:28.603414Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2017-04-03T12:56:28.603430Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-04-03T12:56:28.603493Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'pxc-cluster-1' at 'gcomm://127.0.0.1': -110 (Connection timed out)
2017-04-03T12:56:28.603506Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-04-03T12:56:28.603514Z 0 [ERROR] WSREP: wsrep::connect(gcomm://127.0.0.1) failed: 7
2017-04-03T12:56:28.603519Z 0 [ERROR] Aborting

2017-04-03T12:56:28.603527Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-04-03T12:56:28.603537Z 0 [Note] WSREP: Service disconnected.
2017-04-03T12:56:31.603624Z 0 [Note] WSREP: Some threads may fail to exit.
2017-04-03T12:56:31.603682Z 0 [Note] Binlog end
2017-04-03T12:56:31.603828Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

How can it fail to connect to 127.0.0.1?

Ok, I’ve made some progress here.

After following the manual on the website to the letter, I’ve managed to have pxc-node-1 (192.168.154.40) and pxc-node-2 (192.168.154.119) synced and exchanging data. The manual I had was for the same version, but I downloaded it on PDF and it wasn’t up-to-date. The privileges.

One of the things I’ve noticed is that after running

root&#64;pxc-node-1:~# ps aux | grep mysql
root 4530 0.0 0.0 4328 756 pts/4 S 14:38 0:00 /bin/sh /usr/bin/mysqld_safe --wsrep-new-cluster
mysql 4999 0.5 3.7 2033804 222440 pts/4 Sl 14:38 0:11 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/libgalera_smm.so --wsrep-new-cluster --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
root 5844 0.0 0.0 11120 692 pts/4 S+ 15:12 0:00 grep mysql

root&#64;pxc-node-1:~# /etc/init.d/mysql status
● mysql.service - LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon
Loaded: loaded (/etc/init.d/mysql)
Active: inactive (dead) since Mon 2017-04-03 14:31:08 UTC; 40min ago
Process: 4320 ExecStop=/etc/init.d/mysql stop (code=exited, status=0/SUCCESS)
Process: 3905 ExecStart=/etc/init.d/mysql start (code=exited, status=0/SUCCESS)

After having the two nodes synced and checking the replication works correctly by creating databases, tables and insert some data from any of the nodes, I try to stop and start pxc-node-2 successfully:

root&#64;pxc-node-2:~# /etc/init.d/mysql stop
[ ok ] Stopping mysql (via systemctl): mysql.service.
root&#64;pxc-node-2:~# /etc/init.d/mysql start
[ ok ] Starting mysql (via systemctl): mysql.service.

So pxc-node-2 is fine. The problem now is that I have the pxc-node-1 running, but I cannot stop it in a normal way:

root&#64;pxc-node-1:~# /etc/init.d/mysql stop 
[ ok ] Stopping mysql (via systemctl): mysql.service.
root&#64;pxc-node-1:~# ps aux | grep mysql
root 4530 0.0 0.0 4328 756 pts/4 S 14:38 0:00 /bin/sh /usr/bin/mysqld_safe --wsrep-new-cluster
mysql 4999 0.4 3.7 2033804 222440 pts/4 Sl 14:38 0:11 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/libgalera_smm.so --wsrep-new-cluster --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
root 5883 0.0 0.0 11120 692 pts/4 S+ 15:18 0:00 grep mysql

As you see the process are already running. Is there a special way to stop a just-bootstraped first node so it can be started in a normal way? (

Instead of killing the processes what I did was to stop the LXC containers where pxc-node-2 and pxc-node-1 were running, so that way everything was stopped gracefully. Now, when I start the container for pxc-node-1 I have the “famous” error I’ve been talking about since the beginning of this thread:

2017-04-03T15:33:57.147121Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-04-03T15:33:57.148445Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.17-11-57) starting as process 585 ...
2017-04-03T15:33:57.151044Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-04-03T15:33:57.151063Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-04-03T15:33:57.154781Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info&#64;codership.com> loaded successfully.
2017-04-03T15:33:57.154846Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-04-03T15:33:57.155209Z 0 [Note] WSREP: Found saved state: 38103d13-187b-11e7-b05f-938dd425b3db:9, safe_to_bootsrap: 1
2017-04-03T15:33:57.172886Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.154.40; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-04-03T15:33:57.182531Z 0 [Note] WSREP: GCache history reset: old(38103d13-187b-11e7-b05f-938dd425b3db:0) -> new(38103d13-187b-11e7-b05f-938dd425b3db:9)
2017-04-03T15:33:57.188962Z 0 [Note] WSREP: Assign initial position for certification: 9, protocol version: -1
2017-04-03T15:33:57.188988Z 0 [Note] WSREP: wsrep_sst_grab()
2017-04-03T15:33:57.188998Z 0 [Note] WSREP: Start replication
2017-04-03T15:33:57.189012Z 0 [Note] WSREP: Setting initial position to 38103d13-187b-11e7-b05f-938dd425b3db:9
2017-04-03T15:33:57.189097Z 0 [Note] WSREP: protonet asio version 0
2017-04-03T15:33:57.189209Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-04-03T15:33:57.189247Z 0 [Note] WSREP: backend: asio
2017-04-03T15:33:57.189302Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-04-03T15:33:57.189404Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-04-03T15:33:57.189415Z 0 [Note] WSREP: restore pc from disk failed
2017-04-03T15:33:57.189997Z 0 [Note] WSREP: GMCast version 0
2017-04-03T15:33:57.192786Z 0 [Warning] WSREP: Failed to resolve tcp://192.168.154.119:4567
2017-04-03T15:33:57.193056Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-04-03T15:33:57.193077Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-04-03T15:33:57.193745Z 0 [Note] WSREP: EVS version 0
2017-04-03T15:33:57.194026Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster-1', peer '192.168.154.40:,192.168.154.119:'
2017-04-03T15:33:57.195442Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') connection established to f3b908c9 tcp://192.168.154.40:4567
2017-04-03T15:33:57.195468Z 0 [Warning] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') address 'tcp://192.168.154.40:4567' points to own listening address, blacklisting
2017-04-03T15:34:00.195176Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-04-03T15:34:00.195215Z 0 [Note] WSREP: view(view_id(NON_PRIM,f3b908c9,1) memb {
f3b908c9,0
} joined {
} left {
} partitioned {
})
2017-04-03T15:34:00.695109Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') connection to peer f3b908c9 with addr tcp://192.168.154.40:4567 timed out, no messages seen in PT3S
2017-04-03T15:34:00.695479Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50179S), skipping check

Even if I can’t start the pxc-node-1 anymore I decide to start the container where pxc-node-2 lives, and again the service tries to start at boot time and it also fails:

2017-04-03T15:40:17.475722Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-04-03T15:40:17.477019Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.17-11-57) starting as process 585 ...
2017-04-03T15:40:17.479723Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-04-03T15:40:17.479743Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-04-03T15:40:17.483576Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info&#64;codership.com> loaded successfully.
2017-04-03T15:40:17.483645Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-04-03T15:40:17.484023Z 0 [Note] WSREP: Found saved state: 38103d13-187b-11e7-b05f-938dd425b3db:9, safe_to_bootsrap: 0
2017-04-03T15:40:17.502731Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.154.119; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-04-03T15:40:17.512730Z 0 [Note] WSREP: GCache history reset: old(38103d13-187b-11e7-b05f-938dd425b3db:0) -> new(38103d13-187b-11e7-b05f-938dd425b3db:9)
2017-04-03T15:40:17.518921Z 0 [Note] WSREP: Assign initial position for certification: 9, protocol version: -1
2017-04-03T15:40:17.518947Z 0 [Note] WSREP: wsrep_sst_grab()
2017-04-03T15:40:17.518955Z 0 [Note] WSREP: Start replication
2017-04-03T15:40:17.518970Z 0 [Note] WSREP: Setting initial position to 38103d13-187b-11e7-b05f-938dd425b3db:9
2017-04-03T15:40:17.519057Z 0 [Note] WSREP: protonet asio version 0
2017-04-03T15:40:17.519174Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-04-03T15:40:17.519214Z 0 [Note] WSREP: backend: asio
2017-04-03T15:40:17.519276Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-04-03T15:40:17.519383Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-04-03T15:40:17.519395Z 0 [Note] WSREP: restore pc from disk failed
2017-04-03T15:40:17.519970Z 0 [Note] WSREP: GMCast version 0
2017-04-03T15:40:17.522754Z 0 [Warning] WSREP: Failed to resolve tcp://192.168.154.119:4567
2017-04-03T15:40:17.522976Z 0 [Note] WSREP: (d66ac8a9, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-04-03T15:40:17.522999Z 0 [Note] WSREP: (d66ac8a9, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-04-03T15:40:17.523707Z 0 [Note] WSREP: EVS version 0
2017-04-03T15:40:17.523919Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster-1', peer '192.168.154.40:,192.168.154.119:'
2017-04-03T15:40:20.525861Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-04-03T15:40:20.525906Z 0 [Note] WSREP: view(view_id(NON_PRIM,d66ac8a9,1) memb {
d66ac8a9,0
} joined {
} left {
} partitioned {
})
2017-04-03T15:40:21.026013Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50235S), skipping check
2017-04-03T15:40:50.533643Z 0 [Note] WSREP: view((empty))
2017-04-03T15:40:50.533780Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2017-04-03T15:40:50.533802Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-04-03T15:40:50.533922Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'pxc-cluster-1' at 'gcomm://192.168.154.40,192.168.154.119': -110 (Connection timed out)
2017-04-03T15:40:50.533945Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-04-03T15:40:50.533958Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.154.40,192.168.154.119) failed: 7
2017-04-03T15:40:50.533967Z 0 [ERROR] Aborting

2017-04-03T15:40:50.533980Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-04-03T15:40:50.533996Z 0 [Note] WSREP: Service disconnected.
2017-04-03T15:40:53.534149Z 0 [Note] WSREP: Some threads may fail to exit.
2017-04-03T15:40:53.534200Z 0 [Note] Binlog end
2017-04-03T15:40:53.534297Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

Is that the expected behaviour? I mean, do I need to always bootstrap one node when all nodes of the cluster have been stopped? I’m asking that because if I stop and start pxc-node-1 while pxc-node-2 is still running, it works perfectly.

So, basically the problem here was that I was stopping all nodes of the cluster, even if “all nodes” mean “only one”. And when you stop all the nodes you need to bootstrap the cluster again, choosing one node to start with.

Could an admin/mod edit the title of this thread and add [SOLVED] to it? I’m unable to edit my own post.

Thanks.