Pt-online-schema-change --execute Hangs forever

  • OS: Ubuntu 18.4 (On WSL – Windows 11)
  • MySQL Version: 5.7.42
  • pt-online-schema-change: pt-online-schema-change 3.5.5
  • Tool: pt-online-schema-change
  • Action: --alter "ADD ALTER entries__expire (expire)"

Hi,

I’m testing pt-online-schema-change tool and I’m running it locally before going into a beta environment and then production.

I’m not sure if I am doing something wrong but when I try to run the query with --execute the command does nothing. Nothing is showed in the terminal after I put the password

Other commands works, --dry-run for example and even adding --print will run pretty quickly and showing all steps that is performing.

This is the command

pt-online-schema-change --host localhost --user root --ask-pass --alter "ADD ALTER entries__expire (expire)" D=mydatabase,t=entries --execute --print

This works with no errors

pt-online-schema-change --host localhost --user root --ask-pass --alter "ADD INDEX entries__expire (expire)" D=mydatabase,t=entries --dry-run  --print

Here the final output (ommitted the first part)

Not copying rows because this is a dry run.
INSERT LOW_PRIORITY IGNORE INTO `mydatabase`.`_entries_new` (`entry_id`, `score`, `data`, `leaderboard_id`, `expire`, `creation`) SELECT `entry_id`, `score`, `data`, `leaderboard_id`, `expire`, `creation` FROM `mydatabase`.`entries` LOCK IN SHARE MODE /*pt-online-schema-change 27903 copy table*/
Not swapping tables because this is a dry run.
Not dropping old table because this is a dry run.
Not dropping triggers because this is a dry run.
DROP TRIGGER IF EXISTS `mydatabase`.`pt_osc_mydatabase_entries_del`
DROP TRIGGER IF EXISTS `mydatabase`.`pt_osc_mydatabase_entries_upd`
DROP TRIGGER IF EXISTS `mydatabase`.`pt_osc_mydatabase_entries_ins`
2024-05-24T12:29:13 Dropping new table...
DROP TABLE IF EXISTS `mydatabase`.`_entries_new`;
2024-05-24T12:29:13 Dropped new table OK.
Dry run complete.  `mydatabase`.`entries` was not altered.

What am I doing wrong?

Hi Federico, probably some issue with locking is preventing pt-osc from starting. After it hangs, check with show processlist and show engine innodb status\G
Also you can try to query innodb_trx table to help pinpoint the issue

TLDR;

Adding --recursion-method=NONE to the command made the trick. And I think is 99% due to my local setup (which is wsl in Windows, with lot of other fun stuff…)
So the full command that works for me is

pt-online-schema-change --host localhost --user root --ask-pass --alter "ADD ALTER entries__expire (expire)" D=mydatabase,t=entries --execute --print --recursion-method=NONE

I won’t close yet the answer because maybe you want to reply already but is solved and tool works fantasticly, I can add this index and drop it (rollback) seamingly

Long Explanation

Hi @Ivan_Groenewold , first of all thank you for your answer. I was asking if i was doing something obviously wrong. I was already checking with show processlist.
Checking it better though I found out few interesting thing when I went down the rabbit hole.

  1. Running pt-online-schema-change command which hangs
  2. In a separate terminal watch -n1 "mysql -uroot -padmin -e 'SHOW PROCESSLIST;'"

I couldn’t see much but sometimes I could, for just few seconds SELECT @@SERVER_ID
I Goolge it with the tool name and I came across this other answer Pt-online-schema-change not working as expected - #6 by matthewb
Where @matthewb talks about adding PTDEBUG=1 to the command,

PTDEBUG=1 pt-online-schema-change --host localhost --user root --ask-pass --alter "ADD ALTER entries__expire (expire)" D=mydatabase,t=entries --execute --print -

Then I could see that, It stopped at # MasterSlave:4350 12721 SELECT @@SERVER_ID
(cropped) output

# Cxn:3988 12721 DBI::db=HASH(0x556404217a70) Setting dbh
# Cxn:3993 12721 DBI::db=HASH(0x556404217a70) SELECT @@server_id /*!50038 , @@hostname*/, CONNECTION_ID() as connection_id
# Cxn:3995 12721 DBI::db=HASH(0x556404217a70) hostname: localhost 0
# Cxn:3981 12721 DBI::db=HASH(0x556404217a70) Connected dbh to localhost h=localhost
# Percona::XtraDB::Cluster:8279 12721 localhost SHOW VARIABLES LIKE 'wsrep_on'
# Percona::XtraDB::Cluster:8281 12721 $VAR1 = undef;
#
# VersionParser:2033 12721 VersionParser got a dbh, trying to get the version
# VersionParser:2081 12721 InnoDB support: DEFAULT
# VersionParser:2093 12721 InnoDB version: 5.7.42
# pt_online_schema_change:8841 12721 innodb_stats_persistent is ON, enabling --analyze-before-swap
# MasterSlave:4304 12721 Recursion methods: processlist hosts
# MasterSlave:4350 12721 SELECT @@SERVER_ID

So I Google (something again, not sure how I found it) and I found this answer here where it should use --recursion-method=NONE

Nothing I add all of this infomation since it could be useful to someone in the future :wink:

thanks for the detailed explanation Federico. Glad to hear it is working now. The tool will auto discover replicas, and if the hostname/ip it discovers is not reachable then it could hang as you experienced. If you want to monitor replicas and throttle based on lag, you can also try using DSN to specify the replicas: pt-online-schema-change — Percona Toolkit Documentation

1 Like