Not the answer you need?
Register and ask your own question!

pt-online-schema-change causes MariaDB/MySQL database to crash for very large tables

colincolin Current User Role Contributor

I am currently trying to run pt-online-schema-change on a MariaDB (v 10.2.36) database, in an attempt to "shrink" InnoDB tables. The table itself is very large -- weighing in at 450Gb.

I am running pt-online-schema-change (version 3.2.1) to attempt this, as follows:

nohup pt-online-schema-change D=my_database,t=very_large_table,h=127.0.0.1,u=root --alter="ENGINE=InnoDB" --alter-foreign-keys-method="auto" --execute &

So basically it spends a long parsing and creating a new version of the table, but then it eventually CRASHES the mySql database, with the following error.log message:

[ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

I have tried the above command on a smaller table (40Gb) and there are no problems at all, but in the case of the large table, I have not had a lot of luck..

Does anyone have any suggestions, perhaps some 'magic' command-line options to use?

Thanks a lot for your help,

Colin

Answers

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    @colin Unfortunately there are no "magic" options. FOREIGN KEYS create much more work for the database because additional locks on all rows in parent tables must be acquired. These metadata locks can cause long timeouts to occur. If possible, attempt to stop (or reduce as much as possible) all other DML while running this alter table.

    I suggest reading the manual https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html and pay careful attention to the 'alter-foreign-keys-method' parameter. Perhaps using drop-swap or none may help, but do be aware of the consequences.

    Depending on how many FKs you have, you might get much faster results by dropping the FKs, running the ALTER, then adding the FKs back in manually.

  • colincolin Current User Role Contributor

    Thanks for the helpful comment @matthewb , I took a careful look at the table and the corresponding documentation.

    One interesting point, with regards to Foreign Keys, there are none within this table, let me show you what the format is:

    CREATE TABLE `TableA` (

     `column1` int(11) NOT NULL,

     `column2` int(11) NOT NULL,

     `column3` int(11) NOT NULL,

     `column4` int(11) NOT NULL,

     `column5` varchar(25) NOT NULL DEFAULT '0',

     `column6` varchar(2048) DEFAULT NULL,

     `lastUpdated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

     PRIMARY KEY (`column1`,`column2`,`column3`,`column4`,`column5`),

     KEY `rd_time` (`lastUpdated`)

    ) ENGINE=InnoDB DEFAULT CHARSET=latin1


    So, I might try using --alter-foreign-keys-method="none" to see if this will speed things up. Also, this composite Primary Key, is that something that I need to take into consideration?

    Given the knowledge of the table structure, are there any other options that you might suggest?

    Thanks,

    Colin

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    If there are no foreign keys, just remove '--alter-foreign-keys-method' that parameter completely; you don't need it.

    Remove the 'nohup' and the '&'. Why do you have those?

    You might want to try a specific chunk size, --chunk-size=5000 to see if that helps with the locking issues.

  • colincolin Current User Role Contributor

    Thanks very much for that @matthewb , I will try what you have suggested.

    Btw, 'nohup' and '&' are UNIX related and will not affect the operation of the command.

    Cheers,

    Colin

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    I know what nohup and & are for, I just understand why you have them there. Why are you wanting to background pt-osc? A better option would be to use screen or tmux

  • colincolin Current User Role Contributor

    Hi @matthewb , I am running this on an AWS connected EC2 server and I cannot rely on the connection remaining open, hence I am using nohup , which is probably a little 'old fashioned'.

    Thanks for clueing me in, I will ask the Unix guru's around here what I should use -- screen, tmux or ..?

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    I use screen. It absolutely helps with disconnections from SSH.

  • colincolin Current User Role Contributor

    Yes, I just checked with our resident Unix guru and he totally agrees -- screen!

    Thanks for the tip!

  • colincolin Current User Role Contributor
    edited November 24

    Hi @matthewb ,

    Unfortunately, the same problem as before -- I'm out of ideas..:

    Command output:

    ...

    Copying `MyDatabase`.`TableName`: 94% 13:11 remain

    Copying `MyDatabase`.`TableName`: 95% 12:05 remain

    2020-11-24T13:07:27 Error dropping trigger: DBI connect('MyDatabase;host=127.0.0.1;mysql_read_default_group=client','root',...) failed: Can't connect to MySQL server on '127.0.0.1' (111 "Connection refused") at /usr/bin/pt-online-schema-change line 2345.

    Error log output:

    ----------------------------

    END OF INNODB MONITOR OUTPUT

    ============================

    InnoDB: ###### Diagnostic info printed to the standard error stream

    2020-11-24 13:07:26 139635055142656 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

    201124 13:07:26 [ERROR] mysqld got signal 6 ;

    This could be because you hit a bug. It is also possible that this binary

    or one of the libraries it was linked against is corrupt, improperly built,

    or misconfigured. This error can also be caused by malfunctioning hardware.


    To report this bug, see https://mariadb.com/kb/en/reporting-bugs


    We will try our best to scrape up some info that will hopefully help

    diagnose the problem, but since we have already crashed,

    something is definitely wrong and this may fail.


    Server version: 10.2.36-MariaDB-log

    ...

  • matthewbmatthewb Senior [email protected] Percona Staff Role

    Hmm. It says error dropping trigger on the crash. Try --no-drop-triggers This has the side effect of keeping the old table around after the rename. If this works, you'll have to manually drop the old table.

  • colincolin Current User Role Contributor
    edited November 24

    Good idea, I will set it up and attempt this, now.


    ps: And I will use screen as well ;-)

  • colincolin Current User Role Contributor

    Hi @matthewb ,

    I ran the job last night using the --no-drop-triggers switch and it again 'hung' the database, (the output follows).

    1. I am wondering if this has ever happened before, with an extra large table?
    2. Is there a switch that we could use to combat this timeout from occurring?
    3. Is there, perhaps at the mySql end, a configuration option which could remedy this problem?

    Thanks,

    Colin


    Command output:

    Copying `MyDatabase`.`TableA`: 96% 09:08 remain

    Copying `MyDatabase`.`TableA`: 96% 08:03 remain

    Not dropping triggers because --no-drop-triggers was specified. To drop the triggers, execute:

    DROP TRIGGER IF EXISTS `TableA`.`pt_osc_mydatabase_tablea_del`

    DROP TRIGGER IF EXISTS `TableA`.`pt_osc_mydatabase_tablea_upd`

    DROP TRIGGER IF EXISTS `TableA`.`pt_osc_mydatabase_tablea_ins`

    `MyDatabasae`.`TableA` was not altered.

        (in cleanup) 2020-11-24T20:59:25 Error copying rows from `MyDatabase`.`TableA` to `MyDatabase`.`_TableA_new`: DBI connect('MyDatabase;host=127.0.0.1;mysql_read_default_group=client','root',...) failed: Can't connect to MySQL server on '127.0.0.1' (111 "Connection refused") at /usr/bin/pt-online-schema-change line 2345.

    From the error log:

    InnoDB: ###### Diagnostic info printed to the standard error stream

    2020-11-24 20:59:24 140504169277184 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

    201124 20:59:24 [ERROR] mysqld got signal 6 ;

    This could be because you hit a bug. It is also possible that this binary

Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.