One node in the cluster gets stopped

I faced the issue that one of my node in the cluster keeps getting stopped after these errors.

[ERROR] Slave SQL: Could not execute Write_rows event on table portal-serve.wp_options; Duplicate entry ‘rewrite_rules’ for key ‘option_name’, Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event’s master log FIRST, end_log_pos 8652, Error_code: 1062

What’s wrong with it ?

Thanks

Paste here result of:
SHOW CREATE TABLE portal-serve.wp_options\G

Also, what is your exact cluster/replication topology now?

Here is the result.

*************************** 1. row ***************************
Table: wp_options
Create Table: CREATE TABLE wp_options (
option_id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
option_name varchar(64) NOT NULL DEFAULT ‘’,
option_value longtext NOT NULL,
autoload varchar(20) NOT NULL DEFAULT ‘yes’,
PRIMARY KEY (option_id),
UNIQUE KEY option_name (option_name)
) ENGINE=InnoDB AUTO_INCREMENT=2816511 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

The replication topology is master-master.

on the nodes what do your recent /var/lib/mysql/GRA_ files show?

use mysqlbinlog to look at the files: http://www.mysqlperformanceblog.com/2012/12/19/percona-xtradb-cluster-pxc-what-about-gra_-log-files/

download GRA-header
cat GRA-header > outputfile.log
cat /var/lib/mysql/GRA_.log >> outputfile.log
mysqlbinlog outputfile.log

This is what I get from GRA log.


/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#120715 2:45:56 server id 1 end_log_pos 107 Start: binlog v 4, server v 5.5.25-debug-log created 120715 2:45:56 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
NHUCUA8BAAAAZwAAAGsAAAABAAQANS41LjI1LWRlYnVnLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA0dQJQEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#140610 11:14:23 server id 2 end_log_pos 87 Query thread_id=28338 exec_time=0 error_code=0
SET TIMESTAMP=1402416863/*!*/;
SET @@session.pseudo_thread_id=28338/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=3, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 194
#140610 11:14:23 server id 2 end_log_pos 160 Table_map: `portal-serve`.`wp_options` mapped to number 855
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 8494, event_type: 30
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Any idea what could be the reason ?

To get the GRA (binary) log decoded, you need to use “–base64-output=DECODE-ROWS --verbose” options with mysqlbinlog.
Also, the header file attached in that blog post works only wih logs from PXC 5.5.x, for 5.6.x you will need appropriate header.

Btw. by master-master do you mean galera replication between 2 (more) nodes? Do you also use standard MySQL replication anywhere in the topology?
What is the PXC version here?

Hi

I think I have a similar problem. We have a three node Xtradb Cluster (5.6.15-25.5-759.wheezy). Clients Connect to only two nodes via a ipvs load balancer.

Every week we import two times a bunch of data (1500-3000 rows) in a single transaction to only one of the two active nodes. Sometimes we got this bug while we import the data. Sometimes not.
Befor we installed the PXC we used the version directy from galera (5.1.62-23.4 with galera-23.2.1) and got the bug only one time in a year. After we switched to PXC on the 16 june we got the bug three times (23 june, 7 july and 10 july).

If the bug occur we can start the failed node and do the import once again an it runs without a error.

Here is the Log from today:


2014-07-10 10:29:16 29911 [ERROR] Slave SQL: Could not execute Write_rows event on table lottohelden.kkb; Duplicate entry '946741' for key 'letzte_nummer', Error_code: 1062; handler error HA
_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 278, Error_code: 1062
2014-07-10 10:29:16 29911 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 11963863
2014-07-10 10:29:16 29911 [Warning] WSREP: Failed to apply app buffer: seqno: 11963863, status: 1
at galera/src/trx_handle.cpp:apply():340
Retrying 2th time
2014-07-10 10:29:16 29911 [ERROR] Slave SQL: Could not execute Write_rows event on table lottohelden.kkb; Duplicate entry '946741' for key 'letzte_nummer', Error_code: 1062; handler error HA
_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 278, Error_code: 1062
2014-07-10 10:29:16 29911 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 11963863
2014-07-10 10:29:16 29911 [Warning] WSREP: Failed to apply app buffer: seqno: 11963863, status: 1
at galera/src/trx_handle.cpp:apply():340
Retrying 3th time
2014-07-10 10:29:16 29911 [ERROR] Slave SQL: Could not execute Write_rows event on table lottohelden.kkb; Duplicate entry '946741' for key 'letzte_nummer', Error_code: 1062; handler error HA
_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 278, Error_code: 1062
2014-07-10 10:29:16 29911 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 11963863
2014-07-10 10:29:16 29911 [Warning] WSREP: Failed to apply app buffer: seqno: 11963863, status: 1
at galera/src/trx_handle.cpp:apply():340
Retrying 4th time
2014-07-10 10:29:16 29911 [ERROR] Slave SQL: Could not execute Write_rows event on table lottohelden.kkb; Duplicate entry '946741' for key 'letzte_nummer', Error_code: 1062; handler error HA
_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 278, Error_code: 1062
2014-07-10 10:29:16 29911 [Warning] WSREP: RBR event 3 Write_rows apply warning: 121, 11963863
2014-07-10 10:29:16 29911 [ERROR] WSREP: Failed to apply trx: source: 6afdc17d-0674-11e4-8528-13d0bc22aafe version: 3 local: 0 state: APPLYING flags: 1 conn_id: 44231 trx_id: 415884858 seqno
s (l: 664181, g: 11963863, s: 11963862, d: 11963858, ts: 1978184313228671)
2014-07-10 10:29:16 29911 [ERROR] WSREP: Failed to apply trx 11963863 4 times
2014-07-10 10:29:16 29911 [ERROR] WSREP: Node consistency compromized, aborting...
2014-07-10 10:29:16 29911 [Note] WSREP: Closing send monitor...
2014-07-10 10:29:16 29911 [Note] WSREP: Closed send monitor.
2014-07-10 10:29:16 29911 [Note] WSREP: gcomm: terminating thread
2014-07-10 10:29:16 29911 [Note] WSREP: gcomm: joining thread
2014-07-10 10:29:16 29911 [Note] WSREP: gcomm: closing backend

2014-07-10 10:29:16 29911 [Note] WSREP: view(view_id(NON_PRIM,6afdc17d-0674-11e4-8528-13d0bc22aafe,22) memb {
f4d2e14b-0673-11e4-99b4-d28700569b85,0
} joined {
} left {
} partitioned {
6afdc17d-0674-11e4-8528-13d0bc22aafe,0
a89cedd7-0674-11e4-8e61-ab5da27ec321,0
})
2014-07-10 10:29:16 29911 [Note] WSREP: view((empty))
2014-07-10 10:29:16 29911 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-07-10 10:29:16 29911 [Note] WSREP: gcomm: closed
2014-07-10 10:29:16 29911 [Note] WSREP: Flow-control interval: [16, 16]
2014-07-10 10:29:16 29911 [Note] WSREP: Received NON-PRIMARY.
2014-07-10 10:29:16 29911 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 11963863)
2014-07-10 10:29:16 29911 [Note] WSREP: Received self-leave message.
2014-07-10 10:29:16 29911 [Note] WSREP: Flow-control interval: [0, 0]
2014-07-10 10:29:16 29911 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2014-07-10 10:29:16 29911 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 11963863)
2014-07-10 10:29:16 29911 [Note] WSREP: RECV thread exiting 0: Success
2014-07-10 10:29:16 29911 [Note] WSREP: recv_thread() joined.
2014-07-10 10:29:16 29911 [Note] WSREP: Closing replication queue.
2014-07-10 10:29:16 29911 [Note] WSREP: Closing slave action queue.
2014-07-10 10:29:16 29911 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted
140710 10:29:16 mysqld_safe Number of processes running now: 0
140710 10:29:16 mysqld_safe WSREP: not restarting wsrep node automatically
140710 10:29:16 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

Here the output from the GRA_*.log file:


mysqlbinlog --base64-output=DECODE-ROWS -vvv /root/GRA_2_10902541.log
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#120715 9:45:56 server id 1 end_log_pos 107 Start: binlog v 4, server v 5.5.25-debug-log created 120715 9:45:56 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
# at 107
#140707 10:23:29 server id 0 end_log_pos 80 Query thread_id=335318 exec_time=0 error_code=0
SET TIMESTAMP=1404721409/*!*/;
SET @@session.pseudo_thread_id=335318/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=3, @@session.auto_increment_offset=3/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 187
#140707 10:23:29 server id 0 end_log_pos 149 Table_map: `lottohelden`.`kkb` mapped to number 74
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 129, event_type: 30
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

The create statement:

Create Table: CREATE TABLE `kkb` (
`nummer` bigint(20) unsigned NOT NULL DEFAULT '0',
`letzte_nummer` bigint(20) unsigned DEFAULT NULL,
`user` varchar(12) NOT NULL DEFAULT '',
`betrag` bigint(20) NOT NULL DEFAULT '0',
`gk` tinyint(6) unsigned NOT NULL DEFAULT '0',
`gk_ref` bigint(20) unsigned NOT NULL DEFAULT '0',
`gv_typ` tinyint(6) unsigned NOT NULL DEFAULT '0',
`gv_ref` bigint(20) unsigned NOT NULL DEFAULT '0',
`saldo` bigint(20) NOT NULL DEFAULT '0',
`alter_saldo` bigint(20) NOT NULL DEFAULT '0',
`gebucht_um` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`gebucht_von` varchar(12) NOT NULL DEFAULT '',
`is_storno` tinyint(1) unsigned NOT NULL DEFAULT '0',
`storniert_nummer` bigint(20) unsigned DEFAULT NULL,
`kommentar` bigint(20) unsigned DEFAULT NULL,
`auszahlbar` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`nummer`),
UNIQUE KEY `storniert_nummer` (`storniert_nummer`),
UNIQUE KEY `letzte_nummer` (`letzte_nummer`),
KEY `gebucht_um` (`gebucht_um`),
KEY `gk` (`gk`),
KEY `gv_typ` (`gv_typ`),
KEY `user` (`user`),
KEY `gv_ref` (`gv_ref`),
KEY `gk_ref` (`gk_ref`),
KEY `is_storno` (`is_storno`),
KEY `gk_ref_2` (`gk_ref`,`gk`),
KEY `user_2` (`user`,`gk`,`nummer`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

I found this Post with a similar error message.

http://www.mysqlperformanceblog.com/…ra_-log-files/

But Frederic wrote: “In this case it’s obvious why it failed but it’s not always the case” and this is not very helpfull for me ;-(

Do you need any additional information?

regards
ralph

Ralph, you may be hitting this bug: https://bugs.launchpad.net/codership-mysql/+bug/1280896 or similar.
Are you using parallel slaves >1?
In any case, upgrading to 5.6.19 may help.