Vagrant+Vbox, first steps to XtraDB. Bootstrap first node shows an error

Hi

I’ve deployed a small environment of test with Vagrant and Vbox with 3 nodes (ubuntu TLS)

IPs are private network, 1 per node

192.168.33.50
192.168.33.51
192.168.33.52

1 cpu and 1 GB RAM per node also
No firewall inside Ubuntu
each node can ping another and all knows the hostname of the others

I’ve read some guides about Xtradb Cluster, finally I choose:

http://redcrackle.com/blog/how-set-percona-xtradb-cluster-ubuntu

but when I wan to restart the first node with bootsrap it finish with an error:

2015-03-16 12:13:59 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
2015-03-16 12:13:59 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-03-16 12:13:59 10647 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-03-16 12:13:59 10647 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib/libgalera_smm.so’
2015-03-16 12:13:59 10647 [Note] WSREP: wsrep_load(): Galera 3.9(r93aca2d) by Codership Oy <info@codership.com> loaded successfully.
2015-03-16 12:13:59 10647 [Note] WSREP: CRC-32C: using “slicing-by-8” algorithm.
2015-03-16 12:13:59 10647 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2015-03-16 12:13:59 10647 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.33.50; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false
2015-03-16 12:13:59 10647 [Note] WSREP: Service thread queue flushed.
2015-03-16 12:13:59 10647 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2015-03-16 12:13:59 10647 [Note] WSREP: wsrep_sst_grab()
2015-03-16 12:13:59 10647 [Note] WSREP: Start replication
2015-03-16 12:13:59 10647 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2015-03-16 12:13:59 10647 [Note] WSREP: protonet asio version 0
2015-03-16 12:13:59 10647 [Note] WSREP: Using CRC-32C for message checksums.
2015-03-16 12:13:59 10647 [Note] WSREP: backend: asio
2015-03-16 12:13:59 10647 [Note] WSREP: restore pc from disk successfully
2015-03-16 12:13:59 10647 [Note] WSREP: GMCast version 0
2015-03-16 12:13:59 10647 [Note] WSREP: (673c2adf, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
2015-03-16 12:13:59 10647 [Note] WSREP: (673c2adf, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
2015-03-16 12:13:59 10647 [Note] WSREP: EVS version 0
2015-03-16 12:13:59 10647 [Note] WSREP: gcomm: bootstrapping new group ‘PXC’
2015-03-16 12:13:59 10647 [Note] WSREP: start_prim is enabled, turn off pc_recovery
2015-03-16 12:13:59 10647 [Note] WSREP: Node 673c2adf state prim
2015-03-16 12:13:59 10647 [Note] WSREP: view(view_id(PRIM,673c2adf,3) memb {
673c2adf,0
} joined {
} left {
} partitioned {
})
2015-03-16 12:13:59 10647 [Note] WSREP: save pc into disk
2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.33.50:4567
2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr proto entry 0x20e1d60
2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.33.51:4567
2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr proto entry 0x20ea4f0
2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.33.52:4567
2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr proto entry 0x20f2ce0
2015-03-16 12:13:59 10647 [Note] WSREP: clear restored view
2015-03-16 12:13:59 10647 [ERROR] WSREP: failed to open gcomm backend connection: 11: : 11 (Resource temporarily unavailable)
at gcs/src/gcs_gcomm.cpp:connect():253
2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -11 (Resource temporarily unavailable)
2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel ‘PXC’ at ‘gcomm://192.168.33.50,192.168.33.51,192.168.33.52’: -11 (Resource temporarily unavailable)
2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs connect failed: Resource temporarily unavailable
2015-03-16 12:13:59 10647 [ERROR] WSREP: wsrep::connect() failed: 7
2015-03-16 12:13:59 10647 [ERROR] Aborting

2015-03-16 12:13:59 10647 [Note] WSREP: Service disconnected.
2015-03-16 12:14:00 10647 [Note] WSREP: Some threads may fail to exit.
2015-03-16 12:14:00 10647 [Note] Binlog end
2015-03-16 12:14:00 10647 [Note] /usr/sbin/mysqld: Shutdown complete

150316 12:14:00 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

It’s very hard for me to debug what is happening… any ideas?

Thanks!

Hi again

finally was able to fix these issue. Was related to available ram, I have setup next parameter into my.cnf and then restarted

performance_schema = off

  • Bootstrapping Percona XtraDB Cluster database server mysqld [ OK ]

Thanks

This is an interesting case. I’ve got a galera cluster with 3 nodes running on CentOS 6.6. I’m using Vagrant to run all the VMs as well. So, I didn’t see the same behaviour here…let’s take a look:

#: enabling the performance_schema on the very first node and bootstrapping the cluster

[vagrant&#64;sql01 ~]$ my_print_defaults mysqld
--user=mysql
--server_id=1
--datadir=/var/lib/mysql
--performance_schema=1 # enabling performance_schema
--log_bin=node01-bin
--log_slave_updates=true
--enforce_gtid_consistency=true
--binlog_format=ROW
--default_storage_engine=innodb
--innodb_locks_unsafe_for_binlog=1
--innodb_autoinc_lock_mode=2
--innodb_log_group_home_dir=/var/lib/mysql
--innodb_log_files_in_group=2
--innodb_log_file_size=48M
--wsrep_provider=/usr/lib64/libgalera_smm.so
--wsrep_cluster_name=sqlmag
--wsrep_node_address=192.168.0.101
--wsrep_node_name=node01
--wsrep_cluster_address=gcomm://192.168.0.101:4567,192.168.0.102:4567,192.168.0.103:4567
--wsrep_slave_threads=2
--wsrep_sst_method=xtrabackup-v2
--wsrep_sst_auth=sst:123
--wsrep_provider_options=pc.ignore_quorum=true

And bootstrapping the cluster…

[vagrant&#64;sql01 ~]$ sudo service mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster)....... SUCCESS!

mysql> select &#64;&#64;performance_schema;
+----------------------+
| &#64;&#64;performance_schema |
+----------------------+
| 1 |
+----------------------+
1 row in set (0.00 sec)

BTW, going ahead to start the second node…

[vagrant&#64;sql02 ~]$ sudo service mysql start
Starting MySQL (Percona XtraDB Cluster)....... SUCCESS!
[vagrant&#64;sql02 ~]$ mysql -u root -e "select &#64;&#64;performance_schema" # performance_schema enabled
+----------------------+
| &#64;&#64;performance_schema |
+----------------------+
| 1 |
+----------------------+

And finally the third node…

[vagrant&#64;sql03 ~]$ sudo service mysql start
Starting MySQL (Percona XtraDB Cluster).............. SUCCESS!
[vagrant&#64;sql03 ~]$ mysql -u root -e "select &#64;&#64;performance_schema" # performance_schema enabled
+----------------------+
| &#64;&#64;performance_schema |
+----------------------+
| 1 |
+----------------------+

I don’t have an environment where I’m running Galera on Ubuntu, but, not sure if it’s different due to the packets.

BTW, the critical on the logs you posted is this part below:

2015-03-16 12:13:59 10647 [ERROR] WSREP: failed to open gcomm backend connection: 11: : 11 (Resource temporarily unavailable)
at gcs/src/gcs_gcomm.cpp:connect():253
2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -11 (Resource temporarily unavailable)
2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'PXC' at 'gcomm://192.168.33.50,192.168.33.51,192.168.33.52': -11 (Resource temporarily unavailable)
2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs connect failed: Resource temporarily unavailable
2015-03-16 12:13:59 10647 [ERROR] WSREP: wsrep::connect() failed: 7
2015-03-16 12:13:59 10647 [ERROR] Aborting

It seems something strange with the gcomm values, I meant, the configured IPs and ports or even…have you bootstrapped the cluster as I stated here on node01?

Cheers,

Hi

it depends also on how much memory you give to your vms. In my case, as I only have a Mac Book Pro with 8 GB of RAM, I allocated just 1 GB RAM per node and with 0 GB of Swap.

Maybe in your environtment your VMS have more RAM than my VMS?

Cheers

R

Hmmm, interesting point. Checking the configuration written to the my VM’s Vagrantfile, I found this:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
# config.vm.provision "shell", path: "setup_env.sh"
config.vm.define "node01" do |node01|
node01.vm.box = "centos65-x86_64"
node01.vm.hostname="sql01"
node01.vm.network "private_network", virtualbox__intnet: "galera", ip: "192.168.0.101"
node01.vm.provider "virtualbox" do |v|
v.memory = 256
v.cpus = 1
end
end
config.vm.define "node02" do |node02|
node02.vm.box = "centos65-x86_64"
node02.vm.hostname="sql02"
node02.vm.network "private_network", virtualbox__intnet: "galera", ip: "192.168.0.102" 
node02.vm.provider "virtualbox" do |v|
v.memory = 256
v.cpus = 1
end
end
config.vm.define "node03" do |node03|
node03.vm.box="centos65-x86_64"
node03.vm.hostname="sql03"
node03.vm.network "private_network", virtualbox__intnet: "galera", ip: "192.168.0.103" 
node03.vm.provider "virtualbox" do |v|
v.memory = 256
v.cpus = 1
end
end
end

Checking the MemTotal form /proc/meminfo:

[vagrant&#64;sql03 ~]$ cat /proc/meminfo | grep MemTotal
MemTotal: 244176 kB

[vagrant&#64;sql02 ~]$ cat /proc/meminfo | grep MemTotal
MemTotal: 244176 kB

[vagrant&#64;sql03 ~]$ cat /proc/meminfo | grep MemTotal
MemTotal: 244176 kB

All these things are showing us that we’re a good direction to confirm that servers on VMs are using just 256MB RAM of my host which is a MacBookPro 8GB RAM as well. So, next action is to start up the Galera Cluster…

[vagrant&#64;sql01 ~]$ sudo service mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster).................................................. SUCCESS!
[vagrant&#64;sql01 ~]$ mysql -u root -e "select &#64;&#64;performance_schema"
+----------------------+
| &#64;&#64;performance_schema |
+----------------------+
| 1 |
+----------------------+

It’s OK, now, I’m going to start up the second (SQL02) e third (SQL03) node:

[vagrant&#64;sql02 ~]$ sudo service mysql start
Starting MySQL (Percona XtraDB Cluster)................ SUCCESS!
[vagrant&#64;sql02 ~]$ mysql -u root -e "select &#64;&#64;performance_schema"
+----------------------+
| &#64;&#64;performance_schema |
+----------------------+
| 1 |
+----------------------+

[vagrant&#64;sql03 ~]$ sudo service mysql start
Starting MySQL (Percona XtraDB Cluster)................. SUCCESS!
[vagrant&#64;sql03 ~]$ mysql -u root -e "select &#64;&#64;performance_schema"
+----------------------+
| &#64;&#64;performance_schema |
+----------------------+
| 1 |
+----------------------+

And finally…

[vagrant&#64;sql01 ~]$ mysql -u root -e "show global status like 'wsrep_cluster_size'"
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+

I’m still thinking what is the resource that was temporarily unavailable in terms of gcomm which is the group communication system used by Galera API as per the documentation here http://galeracluster.com/documentation-webpages/architecture.html . I tried to simulate something so as to get the stated error - even configuring my gcomm with a inexistent IP address, it was able to bootstrap with clean start up logs, having the second node joined the cluster view.

2015-03-18 18:51:01 3024 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 1,
members = 2/2 (joined/total),
act_id = 10166,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = f9cdb70c-8483-11e4-86af-4b70fd8e7a88
2015-03-18 18:51:01 3024 [Note] WSREP: Flow-control interval: [23, 23]