Corruption running Alter Query - Trying to access page outside the tablespace bounds

Aprhrial · May 9, 2018, 8:43am

Hi!

Apologies for the wall of text ahead, I’ve kind of being pulling my hair out a little trying to figure out what the problem is.

I ran into a bit of system downtime last week while running a few DML statements on a Percona MySQL DB (Ver 5.7.20-19 for debian-linux-gnu on x86_64 (Percona Server (GPL), Release ‘19’, Revision ‘3c5d3e5d53c’)). I was wondering if anyone knows of any reason why adding a few columns & indexes would cause a tablespace (for a single table) to become corrupted.

At the time, the schema was physically pulled out and the database was set to recovery with InnoDB recovery mode 1. This seemed to repair any issues with references to the missing schema. The database was then booted in normal mode and has been continuing to run without issue. (Just without the problem schema).

There were 12 alter table statements in total affecting one table: 6 added virtual columns to pull JSON data out of a particular column, and 6 adding indexes to those new columns. When executed individually, these queries worked absolutely fine. When combining queries to create the virtual column and add an index to that column in the same statement, the service crashes after maybe 30 seconds into the query.

The error log shows nothing significant other than this:

Where SCHEMA and TABLE were the targets of the alter table statements. The service can be started up again. This runs fine until anything wants to read from the table, at which point it crashes with the same error.

Although this occurred on a dedicated server, this crash is also reproducible with the data set I have running on a local docker instance. I can destroy the container, create a new image, import the data set, run the queries, and it will fail with the same problem.

Things I’ve concretely tried:
[LIST]
[]Running as one query - this breaks.
[]Running the queries independently - this works.
[]Combining into 2 queries - one to add all columns, one to add all indexes - this works,
[]Pairing into 6 queries - adding the columns + adding the column index - this breaks on pairs 2 and 5 out of 6 (seen below).
[]Pairing into 6 queries BUT replacing the AFTER part with AFTER data on all new columns (so the order would be data, new col #6, new col #5, … , new col #1) - this works somehow.
[]Removing Percona and running as one query - this works.
[]Upgrading to Percona 5.7.21 and running as one query - this breaks.
[]Truncating the table and running as one query - this works.
[/LIST] I’ve also tried changing multi key index orders; adding, removing, then adding columns; changing the order of queries, all to mixed results.

I’ve not been able to narrow down exactly what causes these certain statements to crash the server, and I’d really like some help in finding out why it happens.

Here’s the config used on docker:

[mysqld]
datadir = /data/mysql

sql-mode="NO_ENGINE_SUBSTITUTION"
query_cache_type=1
query_cache_limit=1M
query_cache_size=250M

innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M 
innodb_flush_log_at_trx_commit = 1
table_open_cache=5000

skip_name_resolve

log_error = /var/log/mysql/mysql_error.log
slow_query_log_file = /var/log/mysql/mysql-slow.log
slow_query_log = 1

ignore-db-dir=mysql-binlog # binlogging is enabled on our remote environments, but disabled on docker.

innodb_use_native_aio = 0
innodb_flush_method = O_DSYNC

Here’s the config used on the remote server:

[mysqld]
datadir = /data/mysql

sql-mode="NO_ENGINE_SUBSTITUTION"
query_cache_type=1
query_cache_limit=1M
query_cache_size=250M

innodb_buffer_pool_size = 45G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
table_open_cache=5000

skip_name_resolve

#general_log_file = /var/log/mysql/mysql.log
#general_log = 1
log_error = /var/log/mysql/mysql_error.log
slow_query_log_file = /var/log/mysql/mysql-slow.log
slow_query_log = 1
#log_queries_not_using_indexelog-bin=mysql-binlog

server_id = 0
log-bin = /data/mysql/mysql-binlog/mysql-binlog
binlog_format = MIXED

ignore-db-dir=mysql-binlog

Here’s the grouped queries which crash the instance:

ALTER TABLE `pfx_calendaritem`
ADD COLUMN `new_business_enq_id` CHAR(32) AS (`data`->>'$.new_business_enq_id') VIRTUAL NULL AFTER `data`,
ADD INDEX `idx_new_business_enq_id_type` (`new_business_enq_id`, `type`),

ADD COLUMN `property_view_id` CHAR(32) AS (`data`->>'$.property_view_id') VIRTUAL NULL AFTER `new_business_enq_id`,
ADD INDEX `property_view_id` (`property_view_id`),

ADD COLUMN `bym_property_id` CHAR(32) AS (`data`->>'$.bym_property_id') VIRTUAL NULL AFTER `property_view_id`,
ADD INDEX `idx_bym_property_id` (`type`, `followup`, `bym_property_id`),

ADD COLUMN `branch_id` CHAR(32) AS (`data`->>'$.branch') VIRTUAL NULL AFTER `bym_property_id`,
ADD INDEX `idx_type_followup_branch_id` (`type`, `followup`, `branch_id`),

ADD COLUMN `watchlist_id` CHAR(32) AS (`data`->>'$.watchlist_id') VIRTUAL NULL AFTER `branch_id`,
ADD INDEX `idx_watchlist_id` (`watchlist_id`),

ADD COLUMN `user_customer_id` CHAR(32) AS (`data`->>'$.user_customer_id') VIRTUAL NULL AFTER `watchlist_id`,
ADD INDEX `idx_user_customer_id_type_followup` (`user_customer_id`, `type`, `followup`);

Here’s the create table statement:

CREATE TABLE `ias_calendaritem` (
`id` CHAR(32) NOT NULL,
`datecreated` INT(11) NOT NULL DEFAULT '0',
`datetime` INT(11) NOT NULL DEFAULT '0',
`type` VARCHAR(27) NOT NULL DEFAULT '',
`data` JSON NULL DEFAULT NULL,
`notes` TEXT NULL,
`level` INT(11) NULL DEFAULT '0' COMMENT '\'0\' means this note is only for users associated within this note stream',
`followup` INT(2) NULL DEFAULT NULL COMMENT '0 => not required,\\n1 => required,\\n2 => required+completed',
`agency_id` CHAR(32) NULL DEFAULT NULL,
`agency_branch_id` CHAR(32) NULL DEFAULT NULL,
`valuer_user_agent_staff_id` CHAR(32) NULL DEFAULT NULL,
`property_id` CHAR(32) NULL DEFAULT NULL,
`user_agent_staff_id` CHAR(32) NULL DEFAULT NULL,
`reassignstatus` INT(1) NULL DEFAULT '0',
`team` VARCHAR(5) NULL DEFAULT NULL,
`lockedTo` CHAR(32) NULL DEFAULT NULL,
`completedBy` CHAR(32) NULL DEFAULT NULL,
`datecompleted` INT(10) NULL DEFAULT NULL,
`onHold` INT(1) NULL DEFAULT '0',
`onHoldAS` CHAR(32) NULL DEFAULT NULL,
`request_message` TEXT NULL,
`response_message` TEXT NULL,
`referring_task` CHAR(32) NULL DEFAULT NULL,
`recipients` TEXT NULL,
`processed` INT(1) NULL DEFAULT '0',
`datedue` INT(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `idx_ias_calendaritem__agency_id` (`agency_id`),
INDEX `idx_ias_calendaritem__agency_branch_id` (`agency_branch_id`),
INDEX `idx_ias_calendaritem__property_id` (`property_id`),
INDEX `idx_ias_calendaritem__user_agent_staff_id` (`user_agent_staff_id`),
INDEX `idx_ias_calendaritem__type` (`type`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;

This table has about 180K rows on our local docker instances, and about 1.3mil rows on the remote server.

Does anyone have any idea on how to continue narrowing this down until the problem is found?

lorraine.pocklington · May 9, 2018, 9:17am

Hello welcome to the Forum, and there’s no problem with a wall of text… It’s definitely preferred to a post that just says ‘my database is broken’

I will in any case bring this to the attention of the Percona Server for MySQL team, but in the first instance I wanted to let you know about some recent webinars by our Principal Support Escalation Specialist, Sveta Smirnova, that might help in terms of following a process for troubleshooting. Here they are:
[LIST]
[][url]Upcoming Webinars
[][url]Upcoming Webinars
[*][url]Upcoming Webinars
[/LIST] There’s also a white paper here: [url]White Papers

Meanwhile, I will let the team know in case they have a direct suggestion or need to follow up on this case with you

Aprhrial · May 11, 2018, 3:20am

Thanks for the resources and action Lorraine! I’ll take a look through the whitepaper and webinars over the next couple of days to see if the cause of the issue is covered.

lorraine.pocklington · May 11, 2018, 3:55am

You’re welcome Aprhrial - I did run this past some of the tech team here, their initial suspicion is that there might be some tablespace corruption. There are quite a few blog posts on how to go about recovering from that. If you need anything else from us do post again…

Topic		Replies	Views
Data corruption after Percona Server 5.7.32-35 to 8.0.32-24 upgrade Percona Server for MySQL 8.0	2	1004	May 11, 2023
mysql tables crashing Other MySQL® Questions	53	3007	October 26, 2016
Table corruption after ALTER TABLE in PXC 8.0.32 tarball version Percona XtraDB Cluster 8.x	1	737	July 12, 2023
Backup is aborting with corrupt page in innodb system tablespace Percona XtraBackup	1	1594	July 17, 2023
drop, recreate and restore individual innodb tables leads to corruption Percona XtraBackup	2	1156	December 14, 2014

Corruption running Alter Query - Trying to access page outside the tablespace bounds

Related topics