SQL statement hangs while replicating on slave (5.5.21)

Hi,

We’re running the Percona version of MySQL, version 5.5.21, and we’re hitting an odd replication issue.

We run the following long-running update (with some filled in values) and find that the CREATE TEMPORARY TABLE consistently hangs when running on one of our slaves.

CREATE TEMPORARY TABLE (SELECT csid, count(csid) AS ${COUNT_COLUMN} FROM ${dataset}.locations AS loc, ${dataset}.entityid_to_csid AS e2c WHERE e2c.entityid=loc.entityid AND docid>=${MINDOC} AND docid<=${MAXDOC} GROUP BY csid)
UNION ALL
(SELECT csid, count(csid) AS ${COUNT_COLUMN} FROM ${dataset}.images AS img, ${dataset}.entityid_to_csid AS e2c WHERE e2c.entityid=img.entityid AND docid>=${MINDOC} AND docid<=${MAXDOC} GROUP BY csid)

(table schema defs below)

This is running on an Ubuntu 11.0.4 server.

I’ve checked the MySQL and Percona release histories for more recent builds, but haven’t found anything that exactly describes my issue. There’s a replication bug fixed in 5.5.22 that sounds like it could be related but it’s tenuous.

Also worth noting that we get a 100% disk I/O on the hanging slave, and that a different slave hangs each time. The replication status appears OK except that the rows read keeps resetting to zero, and the process status is reported as “Sending data”. When the update runs correctly, we see a message indicating that the data is being sent to the temporary table instead.

Any ideas on how we can resolve this would be greatly appreciated. Also apologies for the cross posting to different forums, I’m not quite sure of the best place to ask this question.

Thanks,
Jim from Digital Science

CREATE TABLE locations (
id INTEGER NOT NULL AUTO_INCREMENT,
docid INTEGER NOT NULL,
entityid INTEGER NOT NULL,
fieldid INTEGER NOT NULL,
rank INTEGER NOT NULL,
startoffset INTEGER NOT NULL,
endoffset INTEGER NOT NULL,
isExemplified TINYINT(1) DEFAULT 0,
PRIMARY KEY(id),
INDEX(docid,fieldid,rank),
INDEX(fieldid),
INDEX(entityid),
UNIQUE(docid,entityid,fieldid,rank,startoffset),
FOREIGN KEY (docid) REFERENCES documents(docid)
) DEFAULT CHARSET=ascii MAX_ROWS=1000000000 ENGINE=INNODB;

CREATE TABLE entityid_to_csid (
entityid INTEGER NOT NULL,
csid INTEGER default NULL,
fragment BOOLEAN default NULL,
connected BOOLEAN default NULL,
radical BOOLEAN default NULL,
element BOOLEAN default 0,
molweight FLOAT default NULL,
INDEX(entityid),
INDEX(csid),
UNIQUE(entityid,csid)
) DEFAULT CHARSET=ascii MAX_ROWS=100000000 ENGINE=INNODB;

Thinking it’s 100% IO, is somehow your MySQL data on a shared storage?

The storage on all three machines (master and two slaves) is provided by Amazon Web Services. We run all instances on EC2 instances, and use ephemeral volumes for storage.

We’re experiencing the same issue on our replication slave, using statement-based replication. The replication thread will hang on a query and be stuck in the UPDATE or INIT states. The mysqld process uses minimal CPU.

We’re unable to stop using SLAVE STOP (hangs indefinitely), and can’t shutdown the server normally (must use kill -9). In addition, SHOW STATUS commands also hang while this issue is occurring.

This started happening to us with an upgrade to the package Percona-Server-server-55-5.5.25a-rel27.1.277.rhel5.x86_64 and is continuing with Percona-Server-server-55-5.5.27-rel28.0.291.rhel5.x86_64.

This message showed up in our mysql error logs the first time the error happened:

InnoDB: Warning: a long semaphore wait:–Thread 1097607488 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:Mutex at 0x2aaab10a38c0 ‘&buf_pool->mutex’, lock var 1waiters flag 1InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:InnoDB: Pending preads 0, pwrites 0InnoDB: ###### Diagnostic info printed to the standard error stream

When shutting down, the process begins normally, but hangs on slave I/O exit:

120827 18:30:53 [Note] /usr/sbin/mysqld: Normal shutdown120827 18:30:53 [Note] Event Scheduler: Purging the queue. 0 events120827 18:30:53 [Note] Slave I/O thread killed while reading event120827 18:30:53 [Note] Slave I/O thread exiting, read up to log ‘mysql-bin.000739’, position 515926767— HANGS HERE —

At this point we need to kill -9 the mysqld/mysqld_safe processes and restart the server.

Running Centos 5.6 with kernel package kernel-2.6.18-308.13.1.el5.centos.plus.x86_64 (as of our percona 5.5.27 update), with plenty of available disk space. Let me know if you need any more information, and thanks for a great product!

Upgrading to yum version Percona-Server-server-55-5.5.27-rel28.1.296.rhel5.x86_64 seems to have fixed the issue thanks!