We’re seeing some weird behaviour in out clusters: We have slow query logs enabled in our clusters, and the config files are set up correctly. What we see is that after a SST or an IST (of which we;re having too many lately) the runtime configuration of the server that received the SST or IST has the slow query log disabled.
Could this be caused by the state transfer process? is this documented somewhere?
Hello @thomas65nl, neither the SST nor IST process copy configuration files (ie: /etc/my.cnf). I would grep through /etc looking for
slow_query_log just to make sure you don’t have multiple my.cnf files in !include directories.
I see my initial question was not clear, my apologies.
I know that the configfiles are not copied by the SST/IST process, and have confirmed that the mysqld.conf (the only one with a reference to the slow query log) has the slow query log set to on, on all the nodes. After the SST, the config files are unchanged, but the running mysql server has the slow query log disabled even though the config file has it enabled.
After a network issue, mysql starts up, and reads the configfile - as this is where it knows where the other members of the cluster are to be able to (re)join the cluster, and in the same config file the slow query log is on, but after the SST it is off, and only in this node. This leads me to think that somewhere in the SST process this is switched off, but I’m trying to understand this phenomenon.
There is nothing in the SST/IST process that affects slow log parameters. I would go into the my.cnf of a downed node, add general_log=ON, and then let that node SST. After the node comes online, look through the general_log for any queries that might have affected the slow log status. There might be something external at play here. Don’t forget to turn general_log back off; it’s a perf killer.
Thanks again. Great suggestion. We’ll try that on a smaller cluster, but the behaviour that I describe is observed on a number of (5.7) clusters. I did try that on a percona 8.0 based cluster and did not observe it there…
I’ll post the findings here.
We also have ClusterControl configured for these clusters, and for some reason we see a lot of activity by clustercontrol after an IST or SST where the slow log is toggled a few times, and set to off despite all settings having it on. We’ve raised a case with them for this.
The suggestion to switch on the general log was the one that got us onto the right track. Thanks!