We have been fighting with an issue where when our database server is under what i would consider a small amount of stress the database gets “hung up” on simple queries leaving them in the state of “waiting for handler commit” for several minutes or more.
An example of a time when we are seeing this:
One connection is loading a table with around 500k entries using a Load Data Infile statement:
LOAD DATA INFILE ‘xxxx’ INTO TABLE yyy CHARACTER SET latin1 FIELDS TERMINATED BY ‘,’ (column1,column2,column3,etc);
While that is running we have an alter statement to 3 small tables, a create table like another table, and a truncate statement running.
With those 5 statements running all other “simple” deletes or updates to a processing table using indexed field for row identification just sit in a “waiting for handler commit” state. Once the larger alter/loads complete the system normally goes back to its happy self like nothing ever happened.
Initial investigation has found:
Sorry unable to post links due to being a new user here
An article on Stackoverflow about “very-slow-writes-on-mysql-8-waiting-for-handler-commit”.
It was suggesting a few changes for IO capacity and a few other adjustments, this had no noticeable impact. Sorry cant put link here due to limit on new user
Suggestions:
innodb_lru_scan_depth=100 # from 1024 to conserve 90% CPU cycles used for function
innodb_io_capacity=1900 # from 200 to allow more IOPSecond to your storage device
innodb_flush_neighbors=2 # from 0 to expedite writing to current extent
innodb_max_dirty_pages_pct_lwm=1 # from 10 percent to expedite writes
innodb_max_dirty_pages_pct=1 # from 90 percent to reduce innodb_buffer_pool_pages_dirty count
innodb_change_buffer_max_size=50 # from 25 percent to expedite your high volume activity
Second one was from itectec “mysql-v8-replication-lagging-waiting-for-handler-commit”.
Suggest that this is related to disabling innodb_adaptive_hash_index this did seem to have a minimal impact but problem still persists.
I am not able to readily reproduce this issue but it does seem like when it happens the longest running items on that server are Load Data statements or alter/update to larger tables.
I have attempted to double/triple the RAM associated to this and adjust the buffer pool with minimal success. I have verified with the data center that we are not hitting any sort of IO limit(typically in the 1500 range)
Server information:
The particular server that we are working with is sitting in a private cloud on Enterprise Grade SSD, has 16 vCPUs and 32GB of RAM. This server is setup to be a primary replication server with a single replica. Currently using Percona 8.0.26
SHOW GLOBAL VARIABLES - Global Variables - Pastebin.com
SHOW GLOBAL STATUS - GLOBAL STATUS - Pastebin.com
Only other thing that comes to my head that is worth mentioning is that this all started happening when we upgraded from Percona 5.7 to Percona 8 and does seem to happen in all of our shared environment.
Any suggestions on where to look next?
As always, appreciate any and all help on this!