Hi!
Apologies for the wall of text ahead, I’ve kind of being pulling my hair out a little trying to figure out what the problem is.
I ran into a bit of system downtime last week while running a few DML statements on a Percona MySQL DB (Ver 5.7.20-19 for debian-linux-gnu on x86_64 (Percona Server (GPL), Release ‘19’, Revision ‘3c5d3e5d53c’)). I was wondering if anyone knows of any reason why adding a few columns & indexes would cause a tablespace (for a single table) to become corrupted.
At the time, the schema was physically pulled out and the database was set to recovery with InnoDB recovery mode 1. This seemed to repair any issues with references to the missing schema. The database was then booted in normal mode and has been continuing to run without issue. (Just without the problem schema).
There were 12 alter table statements in total affecting one table: 6 added virtual columns to pull JSON data out of a particular column, and 6 adding indexes to those new columns. When executed individually, these queries worked absolutely fine. When combining queries to create the virtual column and add an index to that column in the same statement, the service crashes after maybe 30 seconds into the query.
The error log shows nothing significant other than this:
Where SCHEMA and TABLE were the targets of the alter table statements. The service can be started up again. This runs fine until anything wants to read from the table, at which point it crashes with the same error.
Although this occurred on a dedicated server, this crash is also reproducible with the data set I have running on a local docker instance. I can destroy the container, create a new image, import the data set, run the queries, and it will fail with the same problem.
Things I’ve concretely tried:
[LIST]
[]Running as one query - this breaks.
[]Running the queries independently - this works.
[]Combining into 2 queries - one to add all columns, one to add all indexes - this works,
[]Pairing into 6 queries - adding the columns + adding the column index - this breaks on pairs 2 and 5 out of 6 (seen below).
[]Pairing into 6 queries BUT replacing the AFTER part with AFTER data
on all new columns (so the order would be data
, new col #6, new col #5, … , new col #1) - this works somehow.
[]Removing Percona and running as one query - this works.
[]Upgrading to Percona 5.7.21 and running as one query - this breaks.
[]Truncating the table and running as one query - this works.
[/LIST] I’ve also tried changing multi key index orders; adding, removing, then adding columns; changing the order of queries, all to mixed results.
I’ve not been able to narrow down exactly what causes these certain statements to crash the server, and I’d really like some help in finding out why it happens.
Here’s the config used on docker:
[mysqld]
datadir = /data/mysql
sql-mode="NO_ENGINE_SUBSTITUTION"
query_cache_type=1
query_cache_limit=1M
query_cache_size=250M
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1
table_open_cache=5000
skip_name_resolve
log_error = /var/log/mysql/mysql_error.log
slow_query_log_file = /var/log/mysql/mysql-slow.log
slow_query_log = 1
ignore-db-dir=mysql-binlog # binlogging is enabled on our remote environments, but disabled on docker.
innodb_use_native_aio = 0
innodb_flush_method = O_DSYNC
Here’s the config used on the remote server:
[mysqld]
datadir = /data/mysql
sql-mode="NO_ENGINE_SUBSTITUTION"
query_cache_type=1
query_cache_limit=1M
query_cache_size=250M
innodb_buffer_pool_size = 45G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
table_open_cache=5000
skip_name_resolve
#general_log_file = /var/log/mysql/mysql.log
#general_log = 1
log_error = /var/log/mysql/mysql_error.log
slow_query_log_file = /var/log/mysql/mysql-slow.log
slow_query_log = 1
#log_queries_not_using_indexelog-bin=mysql-binlog
server_id = 0
log-bin = /data/mysql/mysql-binlog/mysql-binlog
binlog_format = MIXED
ignore-db-dir=mysql-binlog
Here’s the grouped queries which crash the instance:
ALTER TABLE `pfx_calendaritem`
ADD COLUMN `new_business_enq_id` CHAR(32) AS (`data`->>'$.new_business_enq_id') VIRTUAL NULL AFTER `data`,
ADD INDEX `idx_new_business_enq_id_type` (`new_business_enq_id`, `type`),
ADD COLUMN `property_view_id` CHAR(32) AS (`data`->>'$.property_view_id') VIRTUAL NULL AFTER `new_business_enq_id`,
ADD INDEX `property_view_id` (`property_view_id`),
ADD COLUMN `bym_property_id` CHAR(32) AS (`data`->>'$.bym_property_id') VIRTUAL NULL AFTER `property_view_id`,
ADD INDEX `idx_bym_property_id` (`type`, `followup`, `bym_property_id`),
ADD COLUMN `branch_id` CHAR(32) AS (`data`->>'$.branch') VIRTUAL NULL AFTER `bym_property_id`,
ADD INDEX `idx_type_followup_branch_id` (`type`, `followup`, `branch_id`),
ADD COLUMN `watchlist_id` CHAR(32) AS (`data`->>'$.watchlist_id') VIRTUAL NULL AFTER `branch_id`,
ADD INDEX `idx_watchlist_id` (`watchlist_id`),
ADD COLUMN `user_customer_id` CHAR(32) AS (`data`->>'$.user_customer_id') VIRTUAL NULL AFTER `watchlist_id`,
ADD INDEX `idx_user_customer_id_type_followup` (`user_customer_id`, `type`, `followup`);
Here’s the create table statement:
CREATE TABLE `ias_calendaritem` (
`id` CHAR(32) NOT NULL,
`datecreated` INT(11) NOT NULL DEFAULT '0',
`datetime` INT(11) NOT NULL DEFAULT '0',
`type` VARCHAR(27) NOT NULL DEFAULT '',
`data` JSON NULL DEFAULT NULL,
`notes` TEXT NULL,
`level` INT(11) NULL DEFAULT '0' COMMENT '\'0\' means this note is only for users associated within this note stream',
`followup` INT(2) NULL DEFAULT NULL COMMENT '0 => not required,\\n1 => required,\\n2 => required+completed',
`agency_id` CHAR(32) NULL DEFAULT NULL,
`agency_branch_id` CHAR(32) NULL DEFAULT NULL,
`valuer_user_agent_staff_id` CHAR(32) NULL DEFAULT NULL,
`property_id` CHAR(32) NULL DEFAULT NULL,
`user_agent_staff_id` CHAR(32) NULL DEFAULT NULL,
`reassignstatus` INT(1) NULL DEFAULT '0',
`team` VARCHAR(5) NULL DEFAULT NULL,
`lockedTo` CHAR(32) NULL DEFAULT NULL,
`completedBy` CHAR(32) NULL DEFAULT NULL,
`datecompleted` INT(10) NULL DEFAULT NULL,
`onHold` INT(1) NULL DEFAULT '0',
`onHoldAS` CHAR(32) NULL DEFAULT NULL,
`request_message` TEXT NULL,
`response_message` TEXT NULL,
`referring_task` CHAR(32) NULL DEFAULT NULL,
`recipients` TEXT NULL,
`processed` INT(1) NULL DEFAULT '0',
`datedue` INT(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `idx_ias_calendaritem__agency_id` (`agency_id`),
INDEX `idx_ias_calendaritem__agency_branch_id` (`agency_branch_id`),
INDEX `idx_ias_calendaritem__property_id` (`property_id`),
INDEX `idx_ias_calendaritem__user_agent_staff_id` (`user_agent_staff_id`),
INDEX `idx_ias_calendaritem__type` (`type`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This table has about 180K rows on our local docker instances, and about 1.3mil rows on the remote server.
Does anyone have any idea on how to continue narrowing this down until the problem is found?