pt-online-schema-change Deadlock found when trying to get lock (lock mode AUTO-INC wait)

While running the following pt-online-schema-change to replace a table with a newer version of itself we encounter a number of deadlocks that severely impacted query latency, in this case I believe inserts into the source table slowed drastically setting off all our alarms.  I am trying to piece together an explanation on why this happened when pt-online-schema-change was designed as a non blocking tool.  Management is hard on me to figure this out.  So any help would be very much appreciated.  Here’s the pt-online-schema-change statement:

sudo pt-online-schema-change --alter='engine=innodb' \
--max-load Threads_running:50,Threads_connected:300 \
--critical-load Threads_running:250,Threads_connected:400 \
--set-vars lock_wait_timeout=3 --tries create_triggers:5:5,drop_triggers:5:5 \
--no-check-replication-filters --recursion-method dsn=t=percona.dsns \
--execute --statistics --print --ask-pass D=logs,t=bll_arg,u=<use>

our underlying application detected locks

Deadlock found when trying to get lock; try restarting transaction

Looking at Innodb Engine Status we see the lock type (lock mode AUTO-INC waiting):

LATEST DETECTED DEADLOCK
------------------------
2020-08-11 19:19:51 7fa2a5926700TOO DEEP OR LONG SEARCH IN THE LOCK TABLE WAITS-FOR GRAPH, 
WE WILL ROLL BACK FOLLOWING TRANSACTION 

*** TRANSACTION:
TRANSACTION 164388054388, ACTIVE 0 sec setting auto-inc lock, thread declared inside InnoDB 5000
mysql tables in use 2, locked 2
3 lock struct(s), heap size 360, 0 row lock(s), undo log entries 2
MySQL thread id 16530486, OS thread handle 0x7fa2a5926700, query id 45930441097 100.##.###.### 
logs update
REPLACE INTO `logs`.`_bll_arg_new` (`bll_arg_id`, `bll_trans_id`, `name`, `value`, `modified`, 
`created`) VALUES (NEW.`bll_arg_id`, NEW.`bll_trans_id`, NEW.`name`, NEW.`value`, NEW.`modified`, 
NEW.`created`)
*** WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `logs`.`_bll_arg_new` trx id 164388054388 lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (2)

In short we’re using pt-online-schema-change to copy an old table to a newer version of itself to regain disk space after we deleted over 5B rows.
My environment is as follows:
MySQL:

mysql> show global variables like '%version%';
+-------------------------+------------------------------------------------------+
| Variable_name | Value |
+-------------------------+------------------------------------------------------+
| innodb_version | 5.6.23-72.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.23-72.1-log |
| version_comment | Percona Server (GPL), Release 72.1, Revision 0503478 |
| version_compile_machine | x86_64 |
| version_compile_os | debian-linux-gnu |
+-------------------------+------------------------------------------------------+

+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_autoinc_lock_mode | 1 |
+--------------------------+-------+

+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| tx_isolation | REPEATABLE-READ |
+---------------+-----------------+

percona-toolkit:

/var/log# pt-online-schema-change --version
pt-online-schema-change 2.2.13

Table we were attempting to copy over

CREATE TABLE `bll_arg` (
`bll_arg_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`bll_trans_id` bigint(20) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`value` text,
`modified` int(10) unsigned NOT NULL,
`created` int(10) unsigned NOT NULL,
PRIMARY KEY (`bll_arg_id`),
KEY `bll_trans_id` (`bll_trans_id`)
) ENGINE=InnoDB AUTO_INCREMENT=27560394433 DEFAULT CHARSET=utf8

Again any help in understanding this would be greatly appreciated.

Hello DBA_Donny,
That particular error message of deadlock is not technically a deadlock, but something that InnoDB treats as one so it can kill it internally. You can see the message at the top telling you the search depth in the waits-for graph is too deep. Basically, you’ve got too many locks trying to happen on this table at the same time and it’s causing some internal constructs to fill up.
There are several things to try: Switch InnoDB to innodb_autoinc_lock_mode=2 which should loosen up the restrictions on the AUTO_INC locking structure, and/or set pt-osc to use a small chunk size, --chunk 1000, to “throttle” the speed of inserts.
No foreign keys referencing this table right?

Thank you for the explanation MatthewB.  This clarifies things.  Correct no foreign keys referencing at the database level.