Crash-Resistant Replication working wrong?

After read about this feature in http://www.percona.com/doc/percona-server/5.5/reliability/in nodb_recovery_update_relay_log.html#crash-resistant-replicat ion ,

innodb_overwrite_relay_log_info

i wanted test this.

What i do?

checking with percona server 5.5.20 and 5.1.59. both case running replication capacity about 100 qps.

set safe variables

sync_binlog = 1innodb_flush_log_at_trx_commit = 1innodb_overwrite_relay_log_info = 1 or innodb_recovery_update_relay_log = 1skip-slave-start

need anything else for this test?

i do emulation power failure over SysRq-B.

echo 1 > /proc/sys/kernel/sysrqecho b > /proc/sysrq-trigger

after server rebooted, see info files.

[root@db40 airo]# cat /local/mysql/data/master.info 15log-bin.01388886886857210.x.xx.xxreplicationreplication33066000[root@db40 airo]# cat /local/mysql/data/relay-log.info /local/mysql/relaylog/relay-bin.000687868868715log-bin.0138888688685722

then start mysql server, seen log after recovery

InnoDB: relay-log.info is detected.InnoDB: relay log: position 868868715, file name /local/mysql/relaylog/relay-bin.000687InnoDB: master log: position 868868572, file name log-bin.013888InnoDB: The InnoDB memory heap is disabled…InnoDB: Apply batch completedInnoDB: In a MySQL replication slave the last master binlog fileInnoDB: position 0 869698571, file name log-bin.013888InnoDB: and relay log fileInnoDB: position 0 869698714, file name /local/mysql/relaylog/relay-bin.000687InnoDB: Last MySQL binlog file position 0 1071255772, file name /local/mysql/binlog/log-bin.000404120221 13:15:20 InnoDB: Restoring buffer pool pages from ib_lru_dump120221 13:15:20 Percona XtraDB (http://www.percona.com) 1.0.17-12.5 started; log sequence number 483548337856InnoDB: relay-log.info was overwritten.120221 13:15:21 [Note] Recovering after a crash using /local/mysql/binlog/log-bin120221 13:17:19 [Note] Starting crash recovery…120221 13:17:19 [Note] Crash recovery finished.

recovery done is fine, replication not started because set option skip this action.

10.8.60.40-(none)-13:19:13> show slave status\G*************************** 1. row *************************** Slave_IO_State: Master_Host: 10.x.xx.xx Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: log-bin.013888 Read_Master_Log_Pos: 868868572 Relay_Log_File: relay-bin.000687 Relay_Log_Pos: 869698714 Relay_Master_Log_File: log-bin.013888 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 869698571 Relay_Log_Space: 864055592 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: 1 row in set (0.00 sec)

Can be seen that Exec_Master_Log_Pos get from binlog, and differs from master.info. from binlog is bigger pos, than master.info and this position is right.

Okey, everything looks great.

Start replication. And get error - Duplicate entry.

10.8.60.40-(none)-13:21:04> show slave status\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.x.xx.xx Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: log-bin.013888 Read_Master_Log_Pos: 988421748 Relay_Log_File: relay-bin.000689 Relay_Log_Pos: 249 Relay_Master_Log_File: log-bin.013888 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1062 Last_Error: Error ‘Duplicate entry ‘15043868’ for key ‘PRIMARY’’ on query. Default database: … [skip this part] Skip_Counter: 0 Exec_Master_Log_Pos: 868868572 Relay_Log_Space: 119553574 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1062 Last_SQL_Error: Error ‘Duplicate entry ‘15043868’ for key ‘PRIMARY’’ on query. Default database: … [skip this part]

We see that Exec position jump back from 869698571 to 868868572. This position replication get from mysql.info.

I’m doing something wrong? or believed that after crash, replication must be fixed by hand? (i mean “change master to” position got from log.err)

This looks like a bug. Please file it on Launchpad.

okey, [URL]https://bugs.launchpad.net/percona-server/+bug/937852[/URL]