Aborted connection to db: 'unconnected' user: 'pmm' host:

Hi,

I’ve a MariaDB 10.2 monitored by PMM remotely, It looks like the monitoring it’s okay, except that on the MariaDB server I’ve a lot of error messages like:
“Aborted connection 18917096 to db: ‘unconnected’ user: ‘pmm’ host: ‘x.x.x.x’ (Got an error reading communication packets)”.
But only for user PMM, all other users are okay.

In the pmm-admin summary I see the following messages:
“[mysql] 2018/12/17 23:13:32 connection.go:372: invalid connection
[mysql] 2018/12/17 23:13:32 connection.go:372: invalid connection
time=“2018-12-17T23:13:36+01:00” level=error msg=“Error scraping for collect.global_status: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:36+01:00” level=error msg=“Error scraping for collect.info_schema.innodb_metrics: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:41+01:00” level=error msg=“Error scraping for collect.global_status: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:43+01:00” level=error msg=“Error scraping for collect.info_schema.innodb_metrics: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:43+01:00” level=error msg=“Error scraping for collect.global_status: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:44+01:00” level=error msg=“Error scraping for collect.global_status: context deadline exceeded” source=“exporter.go:110”
[mysql] 2018/12/17 23:13:44 connection.go:372: invalid connection
time=“2018-12-17T23:13:45+01:00” level=error msg=“Error scraping for collect.info_schema.innodb_cmpmem: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:45+01:00” level=error msg=“Error scraping for collect.info_schema.innodb_cmp: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:45+01:00” level=error msg=“Error scraping for collect.info_schema.processlist: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:45+01:00” level=error msg=“Error scraping for collect.perf_schema.eventswaits: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:45+01:00” level=error msg=“Error scraping for collect.perf_schema.tablelocks: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:45+01:00” level=error msg=“Error scraping for collect.perf_schema.file_events: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:45+01:00” level=error msg=“Error scraping for collect.slave_status: context deadline exceeded” source=“exporter.go:110”
time=“2018-12-17T23:13:46+01:00” level=error msg=“Error pinging mysqld: context deadline exceeded” source=“exporter.go:87”
[mysql] 2018/12/17 23:13:46 connection.go:372: invalid connection”

Any idea what’s wrong?

Kind Regards,
Zoltan

The graphs looks like have missing data:

[IMG2=JSON]{“data-align”:“none”,“data-size”:“full”,“src”:“https://www.percona.com/forums/image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==”}[/IMG2]​

Hi beckzg
Generally context deadline exceeded means the query took too long. is this server heavily loaded? I couldn’t tell because the images don’t load, they seem broken from your 2nd post.

If I can jump in here, I’m seeing the same thing, though with Percona 5.7 as the server being monitored. Entries such as:

time=“2019-02-06T11:00:15-05:00” level=error msg=“Error scraping for collect.perf_schema.file_events: context canceled” source=“exporter.go:110”
time=“2019-02-06T11:00:15-05:00” level=error msg=“Error scraping for collect.perf_schema.eventswaits: context canceled” source=“exporter.go:110”
time=“2019-02-06T11:11:34-05:00” level=error msg=“Error scraping for collect.perf_schema.eventswaits: context deadline exceeded” source=“exporter.go:110”
time=“2019-02-06T11:21:20-05:00” level=error msg=“Error scraping for collect.global_status: context deadline exceeded” source=“exporter.go:110”

show up in the file ‘pmm-mysql-metrics-42002.log’ and in mysql-error.log I see these:

2019-02-06T22:13:09.820558Z 39057481 [Note] Aborted connection 39057481 to db: ‘unconnected’ user: ‘pmm’ host: ‘localhost’ (Got an error writing communication packets)
2019-02-06T22:13:44.883203Z 39057627 [Note] Aborted connection 39057627 to db: ‘unconnected’ user: ‘pmm’ host: ‘localhost’ (Got an error writing communication packets)
2019-02-06T22:13:54.887291Z 39057747 [Note] Aborted connection 39057747 to db: ‘unconnected’ user: ‘pmm’ host: ‘localhost’ (Got an error writing communication packets)
2019-02-06T22:13:59.858036Z 39057716 [Note] Aborted connection 39057716 to db: ‘unconnected’ user: ‘pmm’ host: ‘localhost’ (Got an error writing communication packets)
2019-02-06T22:14:04.853018Z 39057791 [Note] Aborted connection 39057791 to db: ‘unconnected’ user: ‘pmm’ host: ‘localhost’ (Got an error writing communication packets)
2019-02-06T22:14:49.872544Z 39057807 [Note] Aborted connection 39057807 to db: ‘unconnected’ user: ‘pmm’ host: ‘localhost’ (Got an error writing communication packets)

at the rate of around 2-8 a minute. These are main production servers with fairly heavy peak loads but we’re not seeing this type of entry for other mysql users.

PMM and DB servers both Ubuntu 16.04. No dropped packets on the network interface. Let me know if you need more information

Thanks
Paul

Hi @hurryupharry

Do you have any gaps in your graphs? Usually the MySQL aborted connections are due to network or client early disconnection, where it sounds like you’ve already ruled out the former. The later I would suggest just restarting the client:
pmm-admin restart --all

context deadline exceeded in the exporter means the query took longer than the time allowed, which is generally 1s. Are you observing any slowness of this server in terms of response time for Production workload? I’d suggest looking also at the Prometheus Exporters dashboard as that will show you how long scrapes are taking, especially perf_schema.eventswaits

if you could share some images we can continue to troubleshoot

Hi Michael,

Thanks for the reply. Unfortunately I now have another problem - our PMM VM’s file system filled up and I can’t see a way of cleaning out logs. And I’m now learning that settings for log retention etc can only be passed in when the container is created. So in short I think I’m going to have to blow away the container, and build it again.

I hope to do that by early next week and will respond by then.

hi again Michael. I got our PMM up and running again. I’m not aware of any gaps in the graphs, however when I found the Prometheus graphs, the perf_schema.eventswaits did indeed seem to be indicating high latency, certainly I think over the 1s, when combined. I am attaching a screenshot of the graph.

I was able to get past this by increasing METRICS_RESOLUTION to 3s. Thanks for your information.

Hi hurryUpHarry

Glad you could solve this!