Not the answer you need?
Register and ask your own question!

pt-osc makes database unusable. It seems do not honor the --max-load parameter

OsticoOstico EntrantCurrent User Role Beginner
Some related informations to explain better the strange behaviour i obtained:

Running: 5.6.29-log MySQL Community Server (GPL)

Some configuration params:
innodb_file_per_table = 1
innodb-buffer-pool-size = 60G
innodb_read_io_threads=64
innodb_write_io_threads=64
innodb-flush-log-at-trx-commit = 2

gtid_mode = on
enforce_gtid_consistency = true
log_slave_updates=true
Server params:
70 GB RAM
16 Cores
Raid10 SSD ( iops approx 20k/s )
I have also a replica for data recovery and failover on which queries are never executed ( only replication thread ) on which for this test i stopped the replication.

i have a table wth


PRIMARY KEY (`id_segment`,`id_job`),
....
PARTITION BY RANGE (id_job)
SUBPARTITION BY HASH (id_job)
(PARTITION PART0 VALUES LESS THAN (160000)
(SUBPARTITION PART_01 ENGINE = InnoDB,
SUBPARTITION PART_02 ENGINE = InnoDB,
SUBPARTITION PART_03 ENGINE = InnoDB,
SUBPARTITION PART_04 ENGINE = InnoDB),
...
[/code]

Occasionally this table receives spikes of high load, bulk inserts of 100 tuples:

Ex:
INSERT INTO tbl_name
    (a,b,c)
VALUES
    (1,1,1),
    (2,2,2),
    ...
    (100,100,100);

and after that phase it receives also an high number of threads connected which update single rows on the primary key with no row concurrency. ( max 150 updates/s ).

There are moreover approx. 100/200 users doing select and update.

Under normal behaviour and load this is the Threads_running:
[LIST]
[*]2016-03-15 17:08:59 - Threads_running   3
[*]2016-03-15 17:09:01 - Threads_running   3
[*]2016-03-15 17:09:03 - Threads_running   2
[*]2016-03-15 17:09:05 - Threads_running   7
[*]2016-03-15 17:09:07 - Threads_running   3
[*]2016-03-15 17:09:09 - Threads_running   2
[*]2016-03-15 17:09:11 - Threads_running   3
[*]2016-03-15 17:09:13 - Threads_running   5
[/LIST]




pt-osc stop executing around 20% of it's work because it exceed the Threads_running default ( 50 ). I tried to raise such value to 100, the results are pretty the same but the crash happened later.

So i tried to launch it with --max-load parameter lower than the default ( 15 ) and i set --critical-load to 400 ( insane ) because after some tries i found that this value is never reached.



and the results under normal load are:
[LIST]
[*]2016-03-15 17:08:05 - Threads_running   3
[*]2016-03-15 17:08:07 - Threads_running   73
[*]2016-03-15 17:08:09 - Threads_running   126
[*]2016-03-15 17:08:11 - Threads_running   152
[*]2016-03-15 17:08:13 - Threads_running   158
[*]2016-03-15 17:08:15 - Threads_running   164
[*]2016-03-15 17:08:17 - Threads_running   170
[*]2016-03-15 17:08:19 - Threads_running   174
[*]2016-03-15 17:08:21 - Threads_running   178
[*]2016-03-15 17:08:23 - Threads_running   183
[*]2016-03-15 17:08:25 - Threads_running   141
[*]2016-03-15 17:08:27 - Threads_running   120
[*]2016-03-15 17:08:29 - Threads_running   141
[*]2016-03-15 17:08:31 - Threads_running   148
[/LIST]

The applications becomes obviously unresponsive and MySQL goes under heavy load.

So, i tried an experiment sending a SIGINT to pt-osc and there are the results:
[LIST]
[*]^C# Exiting on SIGINT.
[*]Not dropping triggers because the tool was interrupted.  To drop the triggers, execute:
	DROP TRIGGER IF EXISTS `db_name`.`pt_osc_db_name_seg_t_del`;
	DROP TRIGGER IF EXISTS `db_name`.`pt_osc_db_name_seg_t_upd`;
	DROP TRIGGER IF EXISTS `db_name`.`pt_osc_db_name_seg_t_ins`;
	Not dropping the new table `db_name`.`_seg_t_new` because the tool was interrupted.  To drop the new table, execute:
	DROP TABLE IF EXISTS `db_name`.`_seg_t_new`;
	`db_name`.`seg_t` was not altered.
	=== Command terminated with exit status 1 (Tue Mar 15 17:08:58 2016) ===
[*]2016-03-15 17:08:59 - Threads_running   3
[*]2016-03-15 17:09:01 - Threads_running   3
[*]2016-03-15 17:09:03 - Threads_running   2
[/LIST]

This let me think that the triggers are not the problem. Seems that the issue is in the queries the the tool executes on copy data from a table to the other one.
So i though that the value of --max-load is not honored in some manner.

Any idea/suggestion on how i can perform this alter table?
Can i improve the command parameters to lower the load?
Some parameter in the server configuration are wrong?

Tell me if you need more info.

Thank you in advance.

Comments

  • OsticoOstico Entrant Current User Role Beginner
    I found the issue. It worked ( hard ) after i lowered the mean load on the database.

    I suppose it was a memory swap and/or a buffer load/unload of MySQL when it load/unload indexes of these large partitions while the application continues to write on the newest partitions. When the tool has begun to write data inside the more little new partitions, the load was much lower.

    84 hours of alter table in total.

    Closed?? :)
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.