We have been fighting with an issue where when our database server is under what i would consider a small amount of stress the database gets “hung up” on simple queries leaving them in the state of “waiting for handler commit” for several minutes or more.
An example of a time when we are seeing this:
One connection is loading a table with around 500k entries using a Load Data Infile statement:
LOAD DATA INFILE ‘xxxx’ INTO TABLE yyy CHARACTER SET latin1 FIELDS TERMINATED BY ‘,’ (column1,column2,column3,etc);
While that is running we have an alter statement to 3 small tables, a create table like another table, and a truncate statement running.
With those 5 statements running all other “simple” deletes or updates to a processing table using indexed field for row identification just sit in a “waiting for handler commit” state. Once the larger alter/loads complete the system normally goes back to its happy self like nothing ever happened.
Initial investigation has found:
Sorry unable to post links due to being a new user here
An article on Stackoverflow about “very-slow-writes-on-mysql-8-waiting-for-handler-commit”.
It was suggesting a few changes for IO capacity and a few other adjustments, this had no noticeable impact. Sorry cant put link here due to limit on new user
innodb_lru_scan_depth=100 # from 1024 to conserve 90% CPU cycles used for function
innodb_io_capacity=1900 # from 200 to allow more IOPSecond to your storage device
innodb_flush_neighbors=2 # from 0 to expedite writing to current extent
innodb_max_dirty_pages_pct_lwm=1 # from 10 percent to expedite writes
innodb_max_dirty_pages_pct=1 # from 90 percent to reduce innodb_buffer_pool_pages_dirty count
innodb_change_buffer_max_size=50 # from 25 percent to expedite your high volume activity
Second one was from itectec “mysql-v8-replication-lagging-waiting-for-handler-commit”.
Suggest that this is related to disabling innodb_adaptive_hash_index this did seem to have a minimal impact but problem still persists.
I am not able to readily reproduce this issue but it does seem like when it happens the longest running items on that server are Load Data statements or alter/update to larger tables.
I have attempted to double/triple the RAM associated to this and adjust the buffer pool with minimal success. I have verified with the data center that we are not hitting any sort of IO limit(typically in the 1500 range)
The particular server that we are working with is sitting in a private cloud on Enterprise Grade SSD, has 16 vCPUs and 32GB of RAM. This server is setup to be a primary replication server with a single replica. Currently using Percona 8.0.26
Only other thing that comes to my head that is worth mentioning is that this all started happening when we upgraded from Percona 5.7 to Percona 8 and does seem to happen in all of our shared environment.
Any suggestions on where to look next?
As always, appreciate any and all help on this!