Node connected but not replicating

Hey Guys

So i’ve just started up 3 nodes and joined them together successfully (all SST via rsync from the primary).

Everything looks like its all good UNTIL, I try and run a command on any node. It runs fine on the node that runs the command BUT it is not replicated to the other nodes.

I’ve tried :
different Group Communication ports
reloading data via SST
both percona xtradb cluster AND mariadb galera cluster (same problem)

a tcpdump of each node shows that 1 node has the other 2 talk to it but isn’t responding at all. another is talking to both and the other is receiving from the other and talking to the first one. (confusing i know)

one thing i can note that is different to the example setups is that the nodes are NOT in the same subnet but are all on the same vlan.

the 3 node are all virtual servers and should be speced high enough to handle the load , 2 vm are hosted on 1 machine while the 3rd is hosted on a different host in another datacenter with a gigabit backbone between them.

a show status like ‘wsrep%’; gives similar results on all 3 servers showing all synced and ready

+----------------------------+-------------------------------------------------------+
| Variable_name | Value |
+----------------------------+-------------------------------------------------------+
| wsrep_local_state_uuid | 0d2511e7-4081-11e3-850d-33c4e4dd3c47 |
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 758758 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_received | 29 |
| wsrep_received_bytes | 10337 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 13.000000 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 28 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | xx.xx.xx.x1:3306,xx.xx.xx.x2:3306,xx.xx.xx.x3:3306 |
| wsrep_cluster_conf_id | 3 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 0d2511e7-4081-11e3-850d-33c4e4dd3c47 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 2 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info&#64;codership.com> |
| wsrep_provider_version | 23.2.7(r157) |
| wsrep_ready | ON |
+----------------------------+-------------------------------------------------------+
40 rows in set (0.00 sec)

I’m currently on the mariadb version of galera as it was the last thing I tested.

Any ideas?

log file from startup to processing a request (identifying parts removed)

mysqld_safe mysqld from pid file /var/lib/mysql/portal2.SomeCompany.com.au.pid ended
mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.LVfVMqjBvT --pid-file=/var/lib/mysql/portal2.SomeCompany.com.au-recover.pid
mysqld_safe WSREP: Recovered position 0d2511e7-4081-11e3-850d-33c4e4dd3c47:758733
[Note] WSREP: wsrep_start_position var submitted: '0d2511e7-4081-11e3-850d-33c4e4dd3c47:758733'
[Note] WSREP: Read nil XID from storage engines, skipping position init
[Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
[Note] WSREP: wsrep_load(): Galera 23.2.7(r157) by Codership Oy <info&#64;codership.com> loaded succesfully.
[Note] WSREP: Found saved state: 0d2511e7-4081-11e3-850d-33c4e4dd3c47:758733
[Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
[Note] WSREP: Passing config to GCS: base_host = xx.xx.xx.46; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 12G; gcache.size = 12G; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:6666; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
[Note] WSREP: Assign initial position for certification: 758733, protocol version: -1
[Note] WSREP: wsrep_sst_grab()
[Note] WSREP: Start replication
[Note] WSREP: Setting initial position to 0d2511e7-4081-11e3-850d-33c4e4dd3c47:758733
[Note] WSREP: protonet asio version 0
[Note] WSREP: backend: asio
[Note] WSREP: GMCast version 0
[Note] WSREP: (c1c14d66-516a-11e3-ad8e-2afb83aac7b9, 'tcp://0.0.0.0:6666') listening at tcp://0.0.0.0:6666
[Note] WSREP: (c1c14d66-516a-11e3-ad8e-2afb83aac7b9, 'tcp://0.0.0.0:6666') multicast: , ttl: 1
[Note] WSREP: EVS version 0
[Note] WSREP: PC version 0
[Note] WSREP: gcomm: connecting to group 'CirrusPortalMaster', peer ''
[Note] WSREP: Node c1c14d66-516a-11e3-ad8e-2afb83aac7b9 state prim
[Note] WSREP: view(view_id(PRIM,c1c14d66-516a-11e3-ad8e-2afb83aac7b9,1) memb {
c1c14d66-516a-11e3-ad8e-2afb83aac7b9,
} joined {
} left {
} partitioned {
})
[Note] WSREP: gcomm: connected
[Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
[Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
[Note] WSREP: Opened channel 'CirrusPortalMaster'
[Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
[Note] WSREP: Waiting for SST to complete.
[Note] WSREP: STATE_EXCHANGE: sent state UUID: c1c2469b-516a-11e3-83fc-4a131df461d1
[Note] WSREP: STATE EXCHANGE: sent state msg: c1c2469b-516a-11e3-83fc-4a131df461d1
[Note] WSREP: STATE EXCHANGE: got state msg: c1c2469b-516a-11e3-83fc-4a131df461d1 from 0 (portal2.SomeCompany.com.au)
[Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 0,
members = 1/1 (joined/total),
act_id = 758733,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 0d2511e7-4081-11e3-850d-33c4e4dd3c47
[Note] WSREP: Flow-control interval: [16, 16]
[Note] WSREP: Restored state OPEN -> JOINED (758733)
[Note] WSREP: Member 0 (portal2.SomeCompany.com.au) synced with group.
[Note] WSREP: Shifting JOINED -> SYNCED (TO: 758733)
[Note] WSREP: New cluster view: global state: 0d2511e7-4081-11e3-850d-33c4e4dd3c47:758733, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 2
[Note] WSREP: SST complete, seqno: 758733
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
InnoDB: Using Linux native AIO
InnoDB: Initializing buffer pool, size = 12.0G
InnoDB: Completed initialization of buffer pool
InnoDB: highest supported file format is Barracuda.
InnoDB: Waiting for the background threads to start
Percona XtraDB (http://www.percona.com) 5.5.33a-MariaDB-31.1 started; log sequence number 747344373401
[Note] Plugin 'FEEDBACK' is disabled.
[Note] Server socket created on IP: '0.0.0.0'.
[Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--log-basename=#' or '--relay-log=portal2-relay-bin' to avoid this problem.
[ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','IGNORE_BAD_TABLE_OPTIONS','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_ALL_TABLES','NO_ZERO_IN_DATE','NO_ZERO_DATE','INVALID_DATES','ERROR_FOR_DIVISION_BY_ZERO','TRADITIONAL','NO_AUTO_CREATE_USER','HIGH_NOT_PRECEDENCE','NO_ENGINE_SUBSTITUTION','PAD_CHAR_TO_FULL_LENGTH'), found type set('REAL_AS_FLOAT','PIPES_AS_CONCAT','ANSI_QUOTES','IGNORE_SPACE','NOT_USED','ONLY_FULL_GROUP_BY','NO_UNSIGNED_SUBTRACTION','NO_DIR_IN_CREATE','POSTGRESQL','ORACLE','MSSQL','DB2','MAXDB','NO_KEY_OPTIONS','NO_TABLE_OPTIONS','NO_FIELD_OPTIONS','MYSQL323','MYSQL40','ANSI','NO_AUTO_VALUE_ON_ZERO','NO_BACKSLASH_ESCAPES','STRICT_TRANS_TABLES','STRICT_A
[ERROR] Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: Assign initial position for certification: 758733, protocol version: 2
[Note] WSREP: Synchronized with group, ready for connections
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.33a-MariaDB-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server, wsrep_23.7.6.rXXXX
[Note] WSREP: declaring 0295ce0a-516b-11e3-b109-e6848601c76c stable
[Note] WSREP: Node c1c14d66-516a-11e3-ad8e-2afb83aac7b9 state prim
[Note] WSREP: view(view_id(PRIM,0295ce0a-516b-11e3-b109-e6848601c76c,2) memb {
0295ce0a-516b-11e3-b109-e6848601c76c,
c1c14d66-516a-11e3-ad8e-2afb83aac7b9,
} joined {
} left {
} partitioned {
})
[Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
[Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
[Note] WSREP: STATE EXCHANGE: sent state msg: 02e2dd4f-516b-11e3-a711-533c5011890b
[Note] WSREP: STATE EXCHANGE: got state msg: 02e2dd4f-516b-11e3-a711-533c5011890b from 0 (dev-portal.SomeCompany.com.au)
[Note] WSREP: STATE EXCHANGE: got state msg: 02e2dd4f-516b-11e3-a711-533c5011890b from 1 (portal2.SomeCompany.com.au)
[Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 1,
members = 2/2 (joined/total),
act_id = 758733,
last_appl. = 0,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 0d2511e7-4081-11e3-850d-33c4e4dd3c47
[Note] WSREP: Flow-control interval: [23, 23]
[Note] WSREP: Member 0 (dev-portal.SomeCompany.com.au) synced with group.
[Note] WSREP: New cluster view: global state: 0d2511e7-4081-11e3-850d-33c4e4dd3c47:758733, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: Assign initial position for certification: 758733, protocol version: 2
[Note] WSREP: (c1c14d66-516a-11e3-ad8e-2afb83aac7b9, 'tcp://0.0.0.0:6666') turning message relay requesting on, nonlive peers: tcp://xx.xx.xx.42:6666 
[Note] WSREP: (c1c14d66-516a-11e3-ad8e-2afb83aac7b9, 'tcp://0.0.0.0:6666') turning message relay requesting off
[Note] WSREP: declaring 0295ce0a-516b-11e3-b109-e6848601c76c stable
[Note] WSREP: declaring 2232fb5d-516b-11e3-b26e-177b05b1b8dd stable
[Note] WSREP: Node 0295ce0a-516b-11e3-b109-e6848601c76c state prim
[Note] WSREP: view(view_id(PRIM,0295ce0a-516b-11e3-b109-e6848601c76c,3) memb {
0295ce0a-516b-11e3-b109-e6848601c76c,
2232fb5d-516b-11e3-b26e-177b05b1b8dd,
c1c14d66-516a-11e3-ad8e-2afb83aac7b9,
} joined {
} left {
} partitioned {
})
[Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
[Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
[Note] WSREP: STATE EXCHANGE: sent state msg: 103e7cb1-516b-11e3-a64b-8f076152cebd
[Note] WSREP: STATE EXCHANGE: got state msg: 103e7cb1-516b-11e3-a64b-8f076152cebd from 0 (dev-portal.SomeCompany.com.au)
[Note] WSREP: STATE EXCHANGE: got state msg: 103e7cb1-516b-11e3-a64b-8f076152cebd from 2 (portal2.SomeCompany.com.au)
[Note] WSREP: STATE EXCHANGE: got state msg: 103e7cb1-516b-11e3-a64b-8f076152cebd from 1 (portal3.SomeCompany.com.au)
[Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 2,
members = 3/3 (joined/total),
act_id = 758733,
last_appl. = 0,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 0d2511e7-4081-11e3-850d-33c4e4dd3c47
[Note] WSREP: Flow-control interval: [28, 28]
[Note] WSREP: New cluster view: global state: 0d2511e7-4081-11e3-850d-33c4e4dd3c47:758733, view# 3: Primary, number of nodes: 3, my index: 2, protocol version 2
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: Assign initial position for certification: 758733, protocol version: 2
[Note] WSREP: Member 1 (portal3.SomeCompany.com.au) synced with group.

Don’t know if it’s related but are the the tables your are testing using InnoDB and not MyISAM ?

All the tables are InnoDB

So after doing a lot of investigation and finding a very very hidden forum post on a google code page i found that the reason i was having issues was the “replicate-do-db=” variables i had on my slave node that was replicating from a production system.

This was only allowing those particular tables to be replicated by galera (isn’t galera supposed to be independent of MySQL???)

Anyway having here at least means that if someone else runs into the problem they will be able to fix it them self

If you replicate from async master into PXC cluster, then replication filters on the slave/PXC member node will apply before the transactions are replicated to other cluster members (still log_slave_updates must be ON on this node).

Yes i did have it on and the replication worked fine the issue was i wasn’t listing ALL my tables that i wanted to replicate to my other nodes on the replicate-do-db variable (this is what galera uses to filter the tables it replicates as well as the way mysql uses it for async replication)

well, what about the error. [ERROR] Incorrect definition of table mysql.event !? looks like something is corrupted.

try running mysqlcheck on system tables…

mysqlcheck -u root -p --auto-repair --optimize mysql

Hmmm hey I totally missed that.

Could have been that i hadn’t upgraded my mysql tables to match my running version (which i have since done :slight_smile: )

Good catch though, but the problem was most definitely the global replicate_do_db variable that galera inherits

hi,

i have mysql server working as master…and pxc working and pxc is working as slave…when my first slave is gets down the second slave starts automatically ? if yes please tell how its work…