Mariadb cluster one of node is going down after a while

Helllo. I have problem with mariadb cluster. I try to import 5 million data to cluster. I have 4 nodes. that’s work about 16 hours. I started import. After a while, one hour later or 12 hours later, it gives error. mariadb service in one node is going down.

This is my .57 node. it’s working… It gives -error log.

151103 10:13:33 [Note] WSREP: (33886e63, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.110.56:4567
151103 10:13:35 [Note] WSREP: (33886e63, 'tcp://0.0.0.0:4567') reconnecting to 4bd1a7bc (tcp://192.168.110.56:4567), attempt 0
151103 10:13:36 [Note] WSREP: evs::proto(33886e63, GATHER, view_id(REG,33886e63,712)) suspecting node: 4bd1a7bc
151103 10:13:36 [Note] WSREP: evs::proto(33886e63, GATHER, view_id(REG,33886e63,712)) suspected node without join message, declaring inactive
151103 10:13:36 [Note] WSREP: declaring 9aaefb29 at tcp://192.168.110.59:4567 stable
151103 10:13:36 [Note] WSREP: declaring dbd306d5 at tcp://192.168.110.58:4567 stable
151103 10:13:36 [Note] WSREP: Node 33886e63 state prim
151103 10:13:36 [Note] WSREP: view(view_id(PRIM,33886e63,713) memb {
33886e63,0
9aaefb29,0
dbd306d5,0
} joined {
} left {
} partitioned {
4bd1a7bc,0
})
151103 10:13:36 [Note] WSREP: save pc into disk
151103 10:13:36 [Note] WSREP: forgetting 4bd1a7bc (tcp://192.168.110.56:4567)
151103 10:13:36 [Note] WSREP: deleting entry tcp://192.168.110.56:4567
151103 10:13:36 [Note] WSREP: (33886e63, 'tcp://0.0.0.0:4567') turning message relay requesting off
151103 10:13:36 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
151103 10:13:36 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6
151103 10:13:36 [Note] WSREP: STATE EXCHANGE: sent state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6
151103 10:13:36 [Note] WSREP: STATE EXCHANGE: got state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6 from 0 (koha-db57)
151103 10:13:36 [Note] WSREP: STATE EXCHANGE: got state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6 from 1 (koha-db59)
151103 10:13:36 [Note] WSREP: STATE EXCHANGE: got state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6 from 2 (koha-db58)
151103 10:13:36 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 327,
members = 3/3 (joined/total),
act_id = 83405,
last_appl. = 83400,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 2648fba7-7d7f-11e5-a48d-8b52655f8f85
151103 10:13:36 [Note] WSREP: Flow-control interval: [28, 28]
151103 10:13:36 [Note] WSREP: New cluster view: global state: 2648fba7-7d7f-11e5-a48d-8b52655f8f85:83405, view# 328: Primary, number of nodes: 3, my index: 0, protocol version 3
151103 10:13:36 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
151103 10:13:36 [Note] WSREP: REPL Protocols: 7 (3, 2)
151103 10:13:36 [Note] WSREP: Service thread queue flushed.
151103 10:13:36 [Note] WSREP: Assign initial position for certification: 83405, protocol version: 3
151103 10:13:36 [Note] WSREP: Service thread queue flushed.
151103 10:13:39 [Note] WSREP: cleaning up 4bd1a7bc (tcp://192.168.110.56:4567)

That’s my .58 node. it’s also working now. it has mysql-error

151103 10:13:34 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:34 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:34 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.110.56:4567
151103 10:13:34 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:34 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:34 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:34 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:35 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') reconnecting to 4bd1a7bc (tcp://192.168.110.56:4567), attempt 0
151103 10:13:36 [Note] WSREP: evs::proto(dbd306d5, GATHER, view_id(REG,33886e63,712)) suspecting node: 4bd1a7bc
151103 10:13:36 [Note] WSREP: evs::proto(dbd306d5, GATHER, view_id(REG,33886e63,712)) suspected node without join message, declaring inactive
151103 10:13:37 [Note] WSREP: declaring 33886e63 at tcp://192.168.110.57:4567 stable
151103 10:13:37 [Note] WSREP: declaring 9aaefb29 at tcp://192.168.110.59:4567 stable
151103 10:13:37 [Note] WSREP: Node 33886e63 state prim
151103 10:13:37 [Note] WSREP: view(view_id(PRIM,33886e63,713) memb {
33886e63,0
9aaefb29,0
dbd306d5,0
} joined {
} left {
} partitioned {
4bd1a7bc,0
})
151103 10:13:37 [Note] WSREP: save pc into disk
151103 10:13:37 [Note] WSREP: forgetting 4bd1a7bc (tcp://192.168.110.56:4567)
151103 10:13:37 [Note] WSREP: deleting entry tcp://192.168.110.56:4567
151103 10:13:37 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:37 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.110.58:4567' pointing to uuid dbd306d5 is blacklisted, skipping
151103 10:13:37 [Note] WSREP: (dbd306d5, 'tcp://0.0.0.0:4567') turning message relay requesting off
151103 10:13:37 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
151103 10:13:37 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
151103 10:13:37 [Note] WSREP: STATE EXCHANGE: sent state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6
151103 10:13:37 [Note] WSREP: STATE EXCHANGE: got state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6 from 0 (koha-db57)
151103 10:13:37 [Note] WSREP: STATE EXCHANGE: got state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6 from 1 (koha-db59)
151103 10:13:37 [Note] WSREP: STATE EXCHANGE: got state msg: 66a13a4f-81fa-11e5-b637-db9fd0dd9ef6 from 2 (koha-db58)
151103 10:13:37 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 327,
members = 3/3 (joined/total),
act_id = 83405,
last_appl. = 83400,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 2648fba7-7d7f-11e5-a48d-8b52655f8f85
151103 10:13:37 [Note] WSREP: Flow-control interval: [28, 28]
151103 10:13:37 [Note] WSREP: New cluster view: global state: 2648fba7-7d7f-11e5-a48d-8b52655f8f85:83405, view# 328: Primary, number of nodes: 3, my index: 2, protocol version 3
151103 10:13:37 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
151103 10:13:37 [Note] WSREP: REPL Protocols: 7 (3, 2)
151103 10:13:37 [Note] WSREP: Service thread queue flushed.
151103 10:13:37 [Note] WSREP: Assign initial position for certification: 83405, protocol version: 3
151103 10:13:37 [Note] WSREP: Service thread queue flushed.
151103 10:13:39 [Note] WSREP: cleaning up 4bd1a7bc (tcp://192.168.110.56:4567)

.59 node is giving same error with.58 node… .56 node is going down. it gives

151103 10:13:53 mysqld_safe Number of processes running now: 0
151103 10:13:53 mysqld_safe WSREP: not restarting wsrep node automatically
151103 10:13:53 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

I have 5-6 million data but I imported 4 million data Then one node is going down. Sometimes it’s going down quickly (after 1 million data). Now, what is my error? what should i do? thanks.

As much as possible limit the data loading in chunks of 10K-50K records, do not import millions of records at one time because Galera is not yet capable of that not until Galera 4 (per advertised).
Or try to bootstrap a single node cluster, load the data and then start up nodes 2 and 3 one at a time to SST from node1.