Ok, I’ve made some progress here.
After following the manual on the website to the letter, I’ve managed to have pxc-node-1 (192.168.154.40) and pxc-node-2 (192.168.154.119) synced and exchanging data. The manual I had was for the same version, but I downloaded it on PDF and it wasn’t up-to-date. The privileges.
One of the things I’ve noticed is that after running
root@pxc-node-1:~# ps aux | grep mysql
root 4530 0.0 0.0 4328 756 pts/4 S 14:38 0:00 /bin/sh /usr/bin/mysqld_safe --wsrep-new-cluster
mysql 4999 0.5 3.7 2033804 222440 pts/4 Sl 14:38 0:11 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/libgalera_smm.so --wsrep-new-cluster --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
root 5844 0.0 0.0 11120 692 pts/4 S+ 15:12 0:00 grep mysql
root@pxc-node-1:~# /etc/init.d/mysql status
● mysql.service - LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon
Loaded: loaded (/etc/init.d/mysql)
Active: inactive (dead) since Mon 2017-04-03 14:31:08 UTC; 40min ago
Process: 4320 ExecStop=/etc/init.d/mysql stop (code=exited, status=0/SUCCESS)
Process: 3905 ExecStart=/etc/init.d/mysql start (code=exited, status=0/SUCCESS)
After having the two nodes synced and checking the replication works correctly by creating databases, tables and insert some data from any of the nodes, I try to stop and start pxc-node-2 successfully:
root@pxc-node-2:~# /etc/init.d/mysql stop
[ ok ] Stopping mysql (via systemctl): mysql.service.
root@pxc-node-2:~# /etc/init.d/mysql start
[ ok ] Starting mysql (via systemctl): mysql.service.
So pxc-node-2 is fine. The problem now is that I have the pxc-node-1 running, but I cannot stop it in a normal way:
root@pxc-node-1:~# /etc/init.d/mysql stop
[ ok ] Stopping mysql (via systemctl): mysql.service.
root@pxc-node-1:~# ps aux | grep mysql
root 4530 0.0 0.0 4328 756 pts/4 S 14:38 0:00 /bin/sh /usr/bin/mysqld_safe --wsrep-new-cluster
mysql 4999 0.4 3.7 2033804 222440 pts/4 Sl 14:38 0:11 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/libgalera_smm.so --wsrep-new-cluster --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
root 5883 0.0 0.0 11120 692 pts/4 S+ 15:18 0:00 grep mysql
As you see the process are already running. Is there a special way to stop a just-bootstraped first node so it can be started in a normal way? (
Instead of killing the processes what I did was to stop the LXC containers where pxc-node-2 and pxc-node-1 were running, so that way everything was stopped gracefully. Now, when I start the container for pxc-node-1 I have the “famous” error I’ve been talking about since the beginning of this thread:
2017-04-03T15:33:57.147121Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-04-03T15:33:57.148445Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.17-11-57) starting as process 585 ...
2017-04-03T15:33:57.151044Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-04-03T15:33:57.151063Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-04-03T15:33:57.154781Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info@codership.com> loaded successfully.
2017-04-03T15:33:57.154846Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-04-03T15:33:57.155209Z 0 [Note] WSREP: Found saved state: 38103d13-187b-11e7-b05f-938dd425b3db:9, safe_to_bootsrap: 1
2017-04-03T15:33:57.172886Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.154.40; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-04-03T15:33:57.182531Z 0 [Note] WSREP: GCache history reset: old(38103d13-187b-11e7-b05f-938dd425b3db:0) -> new(38103d13-187b-11e7-b05f-938dd425b3db:9)
2017-04-03T15:33:57.188962Z 0 [Note] WSREP: Assign initial position for certification: 9, protocol version: -1
2017-04-03T15:33:57.188988Z 0 [Note] WSREP: wsrep_sst_grab()
2017-04-03T15:33:57.188998Z 0 [Note] WSREP: Start replication
2017-04-03T15:33:57.189012Z 0 [Note] WSREP: Setting initial position to 38103d13-187b-11e7-b05f-938dd425b3db:9
2017-04-03T15:33:57.189097Z 0 [Note] WSREP: protonet asio version 0
2017-04-03T15:33:57.189209Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-04-03T15:33:57.189247Z 0 [Note] WSREP: backend: asio
2017-04-03T15:33:57.189302Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-04-03T15:33:57.189404Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-04-03T15:33:57.189415Z 0 [Note] WSREP: restore pc from disk failed
2017-04-03T15:33:57.189997Z 0 [Note] WSREP: GMCast version 0
2017-04-03T15:33:57.192786Z 0 [Warning] WSREP: Failed to resolve tcp://192.168.154.119:4567
2017-04-03T15:33:57.193056Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-04-03T15:33:57.193077Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-04-03T15:33:57.193745Z 0 [Note] WSREP: EVS version 0
2017-04-03T15:33:57.194026Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster-1', peer '192.168.154.40:,192.168.154.119:'
2017-04-03T15:33:57.195442Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') connection established to f3b908c9 tcp://192.168.154.40:4567
2017-04-03T15:33:57.195468Z 0 [Warning] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') address 'tcp://192.168.154.40:4567' points to own listening address, blacklisting
2017-04-03T15:34:00.195176Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-04-03T15:34:00.195215Z 0 [Note] WSREP: view(view_id(NON_PRIM,f3b908c9,1) memb {
f3b908c9,0
} joined {
} left {
} partitioned {
})
2017-04-03T15:34:00.695109Z 0 [Note] WSREP: (f3b908c9, 'tcp://0.0.0.0:4567') connection to peer f3b908c9 with addr tcp://192.168.154.40:4567 timed out, no messages seen in PT3S
2017-04-03T15:34:00.695479Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50179S), skipping check
Even if I can’t start the pxc-node-1 anymore I decide to start the container where pxc-node-2 lives, and again the service tries to start at boot time and it also fails:
2017-04-03T15:40:17.475722Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-04-03T15:40:17.477019Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.17-11-57) starting as process 585 ...
2017-04-03T15:40:17.479723Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-04-03T15:40:17.479743Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-04-03T15:40:17.483576Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info@codership.com> loaded successfully.
2017-04-03T15:40:17.483645Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-04-03T15:40:17.484023Z 0 [Note] WSREP: Found saved state: 38103d13-187b-11e7-b05f-938dd425b3db:9, safe_to_bootsrap: 0
2017-04-03T15:40:17.502731Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.154.119; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-04-03T15:40:17.512730Z 0 [Note] WSREP: GCache history reset: old(38103d13-187b-11e7-b05f-938dd425b3db:0) -> new(38103d13-187b-11e7-b05f-938dd425b3db:9)
2017-04-03T15:40:17.518921Z 0 [Note] WSREP: Assign initial position for certification: 9, protocol version: -1
2017-04-03T15:40:17.518947Z 0 [Note] WSREP: wsrep_sst_grab()
2017-04-03T15:40:17.518955Z 0 [Note] WSREP: Start replication
2017-04-03T15:40:17.518970Z 0 [Note] WSREP: Setting initial position to 38103d13-187b-11e7-b05f-938dd425b3db:9
2017-04-03T15:40:17.519057Z 0 [Note] WSREP: protonet asio version 0
2017-04-03T15:40:17.519174Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-04-03T15:40:17.519214Z 0 [Note] WSREP: backend: asio
2017-04-03T15:40:17.519276Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-04-03T15:40:17.519383Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-04-03T15:40:17.519395Z 0 [Note] WSREP: restore pc from disk failed
2017-04-03T15:40:17.519970Z 0 [Note] WSREP: GMCast version 0
2017-04-03T15:40:17.522754Z 0 [Warning] WSREP: Failed to resolve tcp://192.168.154.119:4567
2017-04-03T15:40:17.522976Z 0 [Note] WSREP: (d66ac8a9, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-04-03T15:40:17.522999Z 0 [Note] WSREP: (d66ac8a9, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-04-03T15:40:17.523707Z 0 [Note] WSREP: EVS version 0
2017-04-03T15:40:17.523919Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster-1', peer '192.168.154.40:,192.168.154.119:'
2017-04-03T15:40:20.525861Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-04-03T15:40:20.525906Z 0 [Note] WSREP: view(view_id(NON_PRIM,d66ac8a9,1) memb {
d66ac8a9,0
} joined {
} left {
} partitioned {
})
2017-04-03T15:40:21.026013Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50235S), skipping check
2017-04-03T15:40:50.533643Z 0 [Note] WSREP: view((empty))
2017-04-03T15:40:50.533780Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2017-04-03T15:40:50.533802Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-04-03T15:40:50.533922Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'pxc-cluster-1' at 'gcomm://192.168.154.40,192.168.154.119': -110 (Connection timed out)
2017-04-03T15:40:50.533945Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-04-03T15:40:50.533958Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.154.40,192.168.154.119) failed: 7
2017-04-03T15:40:50.533967Z 0 [ERROR] Aborting
2017-04-03T15:40:50.533980Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-04-03T15:40:50.533996Z 0 [Note] WSREP: Service disconnected.
2017-04-03T15:40:53.534149Z 0 [Note] WSREP: Some threads may fail to exit.
2017-04-03T15:40:53.534200Z 0 [Note] Binlog end
2017-04-03T15:40:53.534297Z 0 [Note] /usr/sbin/mysqld: Shutdown complete
Is that the expected behaviour? I mean, do I need to always bootstrap one node when all nodes of the cluster have been stopped? I’m asking that because if I stop and start pxc-node-1 while pxc-node-2 is still running, it works perfectly.