PMM QAN stopped to present information for about one hour and then resumed properly

Greetings,

We use PMM 1.17.0 to monitor MySQL DB hosts. Currently two MySQL hosts are being monitored, for few months.
Today, the QAN presentation for one of the hosts became unavailable.Namely, the QAN information for the time period of about 09:50-10:50 AM is not presented.
When trying to present it - a message appears “There is no query data because the MySQL Server is not configured for monitoring. For details about the required configuration, see Configuring MySQL for Percona Monitoring and Management in PMM documentation.”. If i try to view information before/after the time period 09:50-10:50 AM - everything is fine presented.
(That happens only for one of the two monitored MySQL hosts. The second host’s QAN information can be presented OK, including the 09:50-10:50 AM period.)

The monitored MySQL DB was up and available all the time.
Other than QAN information is presented properly for the 09:50-10:50 AM period, for example “MySQL Overview” or “MySQL InnoDB Metrics”.

What can be reason of such a behavior, and is there any solution/workaround for it ?

Regards,
Avi

Please find attached few screenshots.

I’ve noticed , that the /var/log/pmm-mysql-queries-0.log file at the problematic MySQL host includes many error messages, like :

2019/05/13 10:43:03.319750 analyzer.go:411: qan-analyzer-mysql-31762327-worker crashed: ‘4304 2019-05-13 07:42:03 UTC to 2019-05-13 07:43:03 UTC (0-0)’: runtime error: invalid memory address or nil pointer dereference
goroutine 3174513 [running]:
runtime/debug.Stack(0x4e5c3c, 0xc42009c0f0, 0x2)
/usr/lib/golang/src/runtime/debug/stack.go:24 +0xa7
runtime/debug.PrintStack()
/usr/lib/golang/src/runtime/debug/stack.go:16 +0x22
github.com/percona/qan-agent/qan/analyzer/mysql.(*RealAnalyzer).runWorker.func1(0xc42006ac00, 0xc4207d2a80)
/tmp/go/src/github.com/percona/qan-agent/qan/analyzer/mysql/analyzer.go:412 +0x2ea
panic(0x7ddd60, 0xa883b0)
/usr/lib/golang/src/runtime/panic.go:502 +0x229
github.com/percona/qan-agent/qan/analyzer/mysql/worker/perfschema.(*Worker).Cleanup(0xc420196a00, 0x0, 0x0)
/tmp/go/src/github.com/percona/qan-agent/qan/analyzer/mysql/worker/perfschema/worker.go:308 +0x77
github.com/percona/qan-agent/qan/analyzer/mysql.(*RealAnalyzer).runWorker.func2(0xc42006ac00)
/tmp/go/src/github.com/percona/qan-agent/qan/analyzer/mysql/analyzer.go:429 +0x3e
github.com/percona/qan-agent/qan/analyzer/mysql.(*RealAnalyzer).runWorker(0xc42006ac00, 0xc4207d2a80)
/tmp/go/src/github.com/percona/qan-agent/qan/analyzer/mysql/analyzer.go:448 +0x386
created by github.com/percona/qan-agent/qan/analyzer/mysql.(*RealAnalyzer).run
/tmp/go/src/github.com/percona/qan-agent/qan/analyzer/mysql/analyzer.go:350 +0x7c9
2019/05/13 10:43:03.319841 ERROR qan-analyzer-mysql-31762327 qan-analyzer-mysql-31762327-worker crashed: ‘4304 2019-05-13 07:42:03 UTC to 2019-05-13 07:43:03 UTC (0-0)’: runtime error: invalid memory address or nil pointer dereference

However, similar messages appear not only during the 09:50-10:50 AM time period, but also before and after it also…

(A similar log file at the second MySQL host (non-problematic) - does not include such messages at all.)

Could those messages be somehow related to the described behavior ? How can i get rid of them ?

Regards,
Avi

hi avi vainshtein
Thanks for reporting this - I don’t have any initial conclusions for you. But can you supply the following: [LIST=1]
[]Attach the output of the logs.zip from PMM Server [URL]Percona Monitoring and Management
[
]Attach the output of pmm-admin summary from the Client [URL]Percona Monitoring and Management
[/LIST] Thanks!

avi vainshtein
Could you also share with us a view of the System Overview for the host where the outage occurred? We want to see how loaded it was during this period

Hello Michael Coburn

Please find attached screenshots of System Overview for the mentioned time period.

Please also find attached the logs.zip from PMM Server

pmm-server_2019-05-14-06-53.zip (68.1 KB)

Please find also attached the output of pmm-admin summary from the Client host :

[root@vm-tcmmydbr ~]# pmm-admin summary

Collecting information for system diagnostic
Collect pmm-admin check-network output … Done
Collect pmm-admin list output … Done
Collect ps output … Done
Collect pt-summary output … Done
Collect list of open ports … Done
Collect service output … Done
Collect pt-mysql-summary output … ERROR 1045 (28000): Access denied for user ‘root’@‘localhost’ (using password: NO)
2019_05_14_10_21_03 Cannot connect to MySQL. Check that MySQL is running and that the options after – are correct.

Data collection complete. Please attach file summary_vm-tcmmydbr_2019-05-14T10_20_42.tar.gz to the issue as requested by Percona Support.

summary_vm-tcmmydbr_2019-05-14T10_20_42.tar.zip (155 KB)

Hello, in this case I think it would be best to raise this in our JIRA system so that our bug analyst can take a proper look at it in detail. Are you OK to do this?

I would help with it but if you own the ticket you will be kept up to date with progress. https://jira.percona.com

Let me know if you’d like help with setting that up though!

Hello @lorraine.pocklington I’ve opened an issue in JIRA system, as per your recommendations.

Many thanks,
Avi

Thank you!