One of our node in a 3 node PXC went down with the below error trace. From the error suspect that the node was not able to apply the replicated trx. Is this a bug ?
PXC : 5.6.35-80.0-56-log running on CentOS 7 (3.10.0-514.10.2.el7.x86_64)
2018-06-20 08:43:28 26088 [Warning] WSREP: failed to replay trx: source: 7ee807fc-73dc-11e8-8169-fa8168816846 version: 3 local: 1 state: EXECUTING flags: 0 conn_id: -1 trx_id: 2214966172 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1529477008413707629)
2018-06-20 08:43:28 26088 [Warning] WSREP: Invalid state in replay for trx source: 7ee807fc-73dc-11e8-8169-fa8168816846 version: 3 local: 1 state: EXECUTING flags: 0 conn_id: -1 trx_id: 2214966172 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1529477008413707629) (FATAL)
at galera/src/replicator_smm.cpp:replay_trx():936
2018-06-20 08:43:28 26088 [ERROR] WSREP: trx_replay failed for: 6, schema: abcabc, query: CALL spc_DA_AggregateData(6, 1529407320, 1529407380)
2018-06-20 08:43:28 26088 [ERROR] Aborting
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 21074
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 16693
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 36
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 16692
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 16796
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 13900
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 20635
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 16627
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 17
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 21085
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 23586
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 286454
2018-06-20 08:43:30 26088 [Note] WSREP: killing local connection: 286406
2018-06-20 08:43:30 26088 [Note] WSREP: Closing send monitor…
2018-06-20 08:43:30 26088 [Note] WSREP: Closed send monitor.
2018-06-20 08:43:30 26088 [Note] WSREP: gcomm: terminating thread
2018-06-20 08:43:30 26088 [Note] WSREP: gcomm: joining thread
2018-06-20 08:43:30 26088 [Note] WSREP: gcomm: closing backend
2018-06-20 08:43:30 26088 [Warning] WSREP: failed to replay trx: source: 7ee807fc-73dc-11e8-8169-fa8168816846 version: 3 local: 1 state: ABORTING flags: 1 conn_id: 286250 trx_id: 2214966262 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1529477010472257755)
2018-06-20 08:43:30 26088 [Warning] WSREP: Invalid state in replay for trx source: 7ee807fc-73dc-11e8-8169-fa8168816846 version: 3 local: 1 state: ABORTING flags: 1 conn_id: 286250 trx_id: 2214966262 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1529477010472257755) (FATAL)
at galera/src/replicator_smm.cpp:replay_trx():936
2018-06-20 08:43:30 26088 [ERROR] WSREP: trx_replay failed for: 6, schema: abcabc, query: CALL spc_DA_UpdateAggregationJobStatus(71,18484,1)
2018-06-20 08:43:30 26088 [ERROR] Aborting
2018-06-20 08:43:31 26088 [Warning] Aborted connection 286454 to db: ‘abcabc’ user: ‘abcabc’ host: ‘10.1.3.101’ (Deadlock found when trying to get lock; try restarting transacti)
2018-06-20 08:43:32 26088 [Note] WSREP: killing local connection: 286507
2018-06-20 08:43:32 26088 [Note] WSREP: killing local connection: 286564
2018-06-20 08:43:32 26088 [Note] WSREP: killing local connection: 286568
2018-06-20 08:43:32 26088 [Note] WSREP: killing local connection: 286580
2018-06-20 08:43:32 26088 [Note] WSREP: Service disconnected.
2018-06-20 08:43:32 26088 [Note] WSREP: rollbacker thread exiting
2018-06-20 08:43:33 26088 [Note] WSREP: (7ee807fc, ‘tcp://0.0.0.0:4567’) connection to peer 69ddcedb with addr tcp://10.1.3.107:4567 timed out, no messages seen in PT3S
2018-06-20 08:43:33 26088 [Note] WSREP: (7ee807fc, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.1.3.107:4567
2018-06-20 08:43:33 26088 [Note] WSREP: Some threads may fail to exit.
2018-06-20 08:43:33 26088 [Note] Binlog end
06:43:33 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at [url]https://bugs.launchpad.net/percona-xtradb-cluster[/url]
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=212
max_threads=302
thread_count=221
connection_count=212
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 136709 K bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.
Thread pointer: 0x2ce0c20
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 7fcfad2f6cf0 thread_stack 0x40000