Hello all,
I’ve been running a 4 node mariadb galera cluster for about a month now (ubuntu 12.04). Over the weekend I was changing the structure of quite a few tables and re-importing a bunch of data. During this, mysql on one of the nodes ceased working. I reached a stopping point, shut everything down and restarted. Now none of the non-primary nodes can sync back with the primary node. For sst I had been using
wsrep_sst_method=xtrabackup
and in innobackup.backup.log I see the message
InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona Ireland Ltd 2009-2012. All Rights Reserved.
This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.
131021 09:56:04 innobackupex: Connecting to MySQL server with DSN ‘dbi:mysql:;mysql_read_default_file=/etc/mysql/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock’ as ‘root’ (using password: YES).
131021 09:56:04 innobackupex: Connected to MySQL server
IMPORTANT: Please check that the backup run completes successfully.
At the end of a successful backup run innobackupex
prints “completed OK!”.
innobackupex: Using mysql server version 5.5.32-MariaDB-1~precise-log
innobackupex: Created backup directory /tmp
131021 09:56:04 innobackupex: Starting ibbackup with command: xtrabackup_55 --defaults-file=“/etc/mysql/my.cnf” --defaults-group=“mysqld” --backup --suspend-at-end --target-dir=/tmp --tmpdir=/tmp --stream=tar
innobackupex: Waiting for ibbackup (pid=7498) to suspend
innobackupex: Suspend file ‘/tmp/xtrabackup_suspended_2’
xtrabackup_55 version 2.1.4 for Percona Server 5.5.31 Linux (x86_64) (revision id: 657)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: using the following InnoDB configuration:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 104857600
xtrabackup: using O_DIRECT
log scanned up to (124686760868)
131021 9:56:04 InnoDB: Warning: allocated tablespace 2125, old maximum was 9
131021 9:56:04 InnoDB: Operating system error number 24 in a file operation.
InnoDB: Error number 24 means ‘Too many open files’.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html
InnoDB: Error: could not open single-table tablespace file
InnoDB: ./xxx/yyy.ibd!
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
InnoDB: To fix the problem and start mysqld:
InnoDB: 1) If there is a permission problem in the file and mysqld cannot
InnoDB: open the file, you should modify the permissions.
InnoDB: 2) If the table is not needed, or you can restore it from a backup,
InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
InnoDB: crash recovery and ignore that table.
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
innobackupex: Error: ibbackup child process has died at /usr//bin/innobackupex line 416.
I tried updating /etc/security/limits.conf to raise the os file limit to no avail. I also tried removing the idb file as suggested in 2) above, but sst then just reports a problem with another table.
Question: is there a way for me to get a list of all tables that would cause sst to fail without having to try to join the cluster from a non-primary node over and over again to go thorugh one table at a time? There are a lot of tables in some of these databases so I’d like to know what I’m getting myself into. What are the odds it will work in the end anyway? Anything else I should be trying? Or any other logs I can check to provide more info?
Any help is greatly appreciated, thank you!