We just performed an upgrade from Percona 5.6 to 5.7.10-3 and every 60-90 minutes we are experiencing queries stall for 8-12 seconds. When this happens there is a larger amount of read activity occurring on the disks. Looking at our charts from Percona 5.6 we did not experience any of this. It seems to be some type of background mysql process that is causing it. Anyone have thoughts on how to debug this?
I’m pretty sure that I have narrowed this down to be some issue when the binary logs get flushed. I have my max binary log file size set to 1GB and queries stall whenever that file rotates every 60-90 minutes.
I am currently tracking down the same issue.
Initially noticed many entries in the slow query log with exactly the same timestamp that correlated to when the binary log rotated based on size ( 1G ).
I could reproduce stalls by manually issuing FLUSH BINARY LOGS.
High read IO is also associated during the flush. ( several seconds on ssd hardware )
Interestingly removing old binary log files with PURGE BINARY LOGS TO also causes the stall and also high IO disk activity. This is unexpected as you would think that removing those not used files should not affect this.
All the above happens when there are approx 20, 1G binary log files ( 3 days worth ). As I purged the older files and decreased the amount of files in the binary log dir/index subsequent purges no longer cause the stalls. Also now with only several binary log files, manual binary log flushes are also instant and do not cause stalls.
Is mysql doing something unexpected during flush/purge when there are many large binary log files on disk and in the binary log index?
Do you by any chance have your binary-log directory on the same disks as your data directory? If you do, that would explain it. Try running iostat -x 5 5 as you purge the binary logs. Look at your i/o utilization.