Number of rows drastically increased of upgrade to Percona MySQL 8.0.34

Hi

Recently I have upgraded one of the replica to new version Percona MySQL 8.0.34 , after that I have modified tables CHARSET and collation as well.

For many tables the number of rows have increased after altering table to change charset and collation.

While I understand pt-table-sync and pt-table-checksum are helpful , I am trying to understand potential cause

and what would better approach to check what are the differences.

Hello @virin_t. That’s impossible. Rows are not magically created when you change character sets, or collations. Did you perhaps mean row size? or table size on disk?

Please show some evidence using COUNT(*) of the table before changing character set and after changing.

Hi @matthewb

I understand, I was kind of surprised as well. I understand that there might difference of table size (as we are changing charset) and other possible reasons, but I am telling about number of rows only

below is one of the examples

before
orders_quotes_data | utf8mb3_general_ci | 211.61 | 40751651 |

after
±-------------------±-----------±-----------+
| Table | Size in GB | Total Rows |
±-------------------±-----------±-----------+
| orders_quotes_data | 292.54 | 62290700 |
±-------------------±-----------±-----------+

I remember doing pt-online schema change for the above table to alter charset

Could you please use COUNT(*)? There is no way that changing the character set added 22 million rows.

Hmm. Ok. There’s a possibility here that some rows could have been duplicated due to the triggers that pt-osc uses to keep the shadow table and original table in sync. Can you provide the full command used? Can you provide the CREATE TABLE for this table?

Hi @matthewb

here is the command that was used

pt-online-schema-change \
  --alter="CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci" \
  --max-load threads_running=500 \
  --max-lag=30 \
  --set-vars lock_wait_timeout=1,foreign_key_checks=0,sql_log_bin=0 \
  --no-check-replication-filters \
  --no-drop-old-table \
  --execute \
  h=XXXXX,D=XXXXXXXX,t=orders_quotes_data 2>&1 | tee -a /tmp/orders_quotes_data_pt-osc_dryrun.log

show create table after charset update

  CREATE TABLE `orders_quotes_data` (
  `id` int unsigned NOT NULL DEFAULT '0',
  `one_id` bigint unsigned DEFAULT NULL,
  `is_persisted` tinyint unsigned NOT NULL DEFAULT '0',
  `xxx` mediumblob,
  `info` blob,
  PRIMARY KEY (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci ROW_FORMAT=COMPRESSED

Hi virin_t!

The correct way to check the amount of rows is to do a “select count(*)” (do note that this will trigger a full table scan and cause very high disk read activity. Execute outside business hours).

The above outputs you showed look like from “information_schema” (IS). IS will only show estimate information and might have outdated information. Rebuilding the table might have refreshed the metadata and thus showing a different result/size.

If both count(*) and pt-table-checksum shows the correct amount of rows in both primary and replica then the most likely explanation is that the metadata from the table was outdated. Else let us know

Regards

1 Like