An existing table, approximately 18 million rows. We added a new column via CLI, ENUM type NULL to this table.
After completion, which took some time, the server is now running at a higher CPU and load. Queries are taking an increased time to complete. The maximum average latency was previously 12 seconds and the same queries are now 3 minutes. Left this for a few days, just in case…
We have since dropped the column, run a REPAIR, OPTIMIZE and ANALYSE. And even restarted the service, but performance is still spiking and higher than previous.
DB Server: Percona Server 5.7 Storage Engine: Innodb
This has been the only change, no corresponding code change.
What am I missing? How can I return back to happier performance?
Also do a query audit and use pt-query-digest (pt-query-digest — Percona Toolkit Documentation) to check if there are some other non optimal queries that can be impacting this other query negatively
Thanks Carlos for your response.
The strange issue is that no queries have changed.
As I mentioned, I added a new column to an existing table and since then, the server latency has been significantly higher. We’ve dropped that column and the issue still persists.
We have used an optimize command, which I understood as rebuilding the table, but still no difference afterwards.
Could there be anything else at play?
And settings to check?
If the overall server performance has changed then likely the cause won’t be a single table.
If performance changed then something must have changed. Whether it be a configuration change, data set size, a minor patch, queries per second, certain threshold in which performance started to degrade, etc…
Do you have metrics from when you had good performance? if not I suggest you install PMM: Installing Percona Monitoring and Management (PMM) Software - Percona and start collecting metrics now.
Start by correlating what resource is saturated (memory, swap, disk performance) with what queries are executed while the server performance is degraded.
The amount of possibilities is too high to describe and so my best advice is to have data to quantify the problem and do a query audit to identify the top consuming queries.
It’s one of the larger tables within the database and the only database table where the column was added / dropped from. As soon as the column was added to this table, the latency increased.
But nothing was yet reading from that column. Since dropping that column, the latency is still increased.
Prior to the change, the table wasn’t featured in the top requests of the slow log, but now it’s full of queries from this table.
The output from EXPLAIN seems to be fine.
Could there be a corrupt index?
Just to update, after working through the slow-log and running EXPLAIN on those long running queries.
It seems that the database engine is now using a different index that what it was previously using.
Nothing was changed on the index, only the addition and removal of a column.
Would ANALYZE help resolve to use the existing index?
There are many other reasons for which a table is not using the expected index. If the amount of expected results is high then a full table scan might be cheaper than using an index.
Maybe the index you expect the database to use is not as optimal as you think it is
Growths in data set size, sampling, differences in query parameters, server configuration or thresholds reached Low Value in range_optimizer_max_mem_size Might Cause Full Table Scan , changes in version after un upgrade etc… can also affect the query behaviour