pmm instance down ?

hello,

currnetly our pmm ec2 instance has been crashed … because too many open tcp sockets ? … what’s the problem?

remaining too many logs…
2016/12/15 10:24:41 http: Accept error: accept tcp 172.17.0.1:42002: accept4: too many open files; retrying in 5ms
2016/12/15 10:24:41 http: Accept error: accept tcp 172.17.0.1:42002: accept4: too many open files; retrying in 10ms
2016/12/15 10:24:42 http: Accept error: accept tcp 172.17.0.1:42002: accept4: too many open files; retrying in 20ms
2016/12/15 10:24:43 http: TLS handshake error from 172.17.0.2:59002: write tcp 172.17.0.1:42002->172.17.0.2:59002: write: broken pipe
2016/12/15 10:24:43 http: TLS handshake error from 172.17.0.2:48450: EOF
2016/12/15 10:24:44 http: TLS handshake error from 172.17.0.2:60626: write tcp 172.17.0.1:42002->172.17.0.2:60626: write: broken pipe

thanks : (

[root@ip-10-2-21-65 log]# docker version
Client:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built:
OS/Arch: linux/amd64

Server:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built:
OS/Arch: linux/amd64

REPOSITORY TAG IMAGE ID CREATED SIZE
percona/pmm-server 1.0.7 a91f4f6237a9 5 days ago 714.4 MB
percona/pmm-server latest 0eade99a1612 8 weeks ago 652.9 MB

[root@ip-10-2-21-65 log]# pmm-admin -v
1.0.7

pmm-admin check-network

[TABLE=“border: 0, cellpadding: 0, cellspacing: 0”]
[TR]
[TD=“width: 110”]--------------[/TD]
[TD=“width: 158”]-----------------[/TD]
[TD=“width: 65”]-------[/TD]
[TD=“width: 85”]----------[/TD]
[TD=“width: 94”]---------[/TD]
[/TR]
[TR]
[TD]SERVICE TYPE[/TD]
[TD]REMOTE ENDPOINT[/TD]
[TD]STATUS[/TD]
[TD]HTTPS/TLS[/TD]
[TD]PASSWORD[/TD]
[/TR]
[TR]
[TD]--------------[/TD]
[TD]-----------------[/TD]
[TD]-------[/TD]
[TD]----------[/TD]
[TD]---------[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42002[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42003[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42004[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42005[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42006[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42007[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42008[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42009[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42010[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD]mysql:metrics[/TD]
[TD]172.17.0.1:42011[/TD]
[TD]DOWN[/TD]
[TD]YES[/TD]
[TD]-[/TD]
[/TR]
[TR]
[TD] [/TD]
[TD] [/TD]
[TD] [/TD]
[TD] [/TD]
[TD] [/TD]
[/TR]
[/TABLE]

Are you saying PMM caused “too many open tcp sockets” problem?
Do you have netstat stats from that?

  1. the sockets continues to increase and server becomes unavailable.
  2. tls errors and down status.
    3, prometheus/targets -> Get http://localhost:9100/metrics: dial tcp [::1]:9100: i/o timeout
  3. if i access the endpoint using curl then i can see ssl error.

netstat
tcp 132 0 172.17.0.1:42003 172.17.0.2:35546 ESTABLISHED off (0.00/0/0)
tcp 0 0 172.17.0.1:42010 172.17.0.2:46726 ESTABLISHED keepalive (73.38/0/0)
tcp 0 0 172.17.0.1:42002 172.17.0.2:41970 ESTABLISHED keepalive (30.89/0/0)
tcp 132 0 172.17.0.1:42006 172.17.0.2:43960 ESTABLISHED off (0.00/0/0)
tcp 0 0 172.17.0.1:42011 172.17.0.2:56352 ESTABLISHED keepalive (87.72/0/0)
tcp 0 0 172.17.0.1:42003 172.17.0.2:53234 ESTABLISHED keepalive (120.49/0/0)
tcp 132 0 172.17.0.1:42005 172.17.0.2:47648 ESTABLISHED off (0.00/0/0)
tcp 0 0 172.17.0.1:42007 172.17.0.2:33106 ESTABLISHED keepalive (176.81/0/0)
tcp 0 0 172.17.0.1:42007 172.17.0.2:36374 ESTABLISHED keepalive (99.49/0/0)
tcp 0 0 172.17.0.1:42002 172.17.0.2:47632 ESTABLISHED keepalive (81.06/0/0)
tcp 132 0 172.17.0.1:42009 172.17.0.2:46248 ESTABLISHED off (0.00/0/0)
tcp 132 0 172.17.0.1:42008 172.17.0.2:32788 ESTABLISHED off (0.00/0/0)

Are you using internal docker IPs to communicate between server and client?

Looks like client address is 172.17.0.1
Can you connect from the inside the container to 172.17.0.1 on any service port, e.g. 42002?

Client address should be set to underlying host private ip. Internal docker ips may not work.

thanks weber,

i think that pmm-admin address/name are automatically setting.
im try to change the info. but everything not changed .

// ========================== //
pmm-admin config --bind-address 10.2.21.65
pmm-admin config --client-address 10.2.21.65

pmm-admin info

pmm-admin 1.0.7

PMM Server | localhost
Client Name | ip-10-2-21-65
Client Address | 172.17.0.1
Service Manager | linux-systemd

Go Version | 1.7.4
Runtime Info | linux/amd64

// ============================ //

telnet 172.17.0.1 42002

Trying 172.17.0.1…
Connected to 172.17.0.1.
Escape character is ‘^]’.

#curl https://172.17.0.1:42002/metrics-hr
curl: (60) Issuer certificate is invalid.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a “bundle”
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn’t adequate, you can specify an alternate file
using the --cacert option.

curl -i http://172.17.0.1:42002

first of all, re-config(reset) of pmm-admin config --options is not changed anything btw server. eg, pmm-admin config --server xxxxx --client-address xxxx .
so i re try to remove/reinstall containers and pmm-client on 4 ec2 instances all, and i can see correct pmm-admin info.

[root@ip-10-2-21-65 source]# pmm-admin check-network
PMM Network Status

Server Address | 10.2.21.xx
Client Address | 10.2.21.xx

  • Connection: Client <-- Server

SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD


mysql:metrics maindb01 10.2.21.xx:42002 DOWN YES -

still i can not solve Client <-- Server Down Remote endpoint status. and not have any deny firewall.

This command should work from the inside of container:
docker exec -ti pmm-server bash
curl --insecure https://10.2.21.xx:42002

check below …

docker exec -ti pmm-server-df bash

root@88119a40dbbe:/opt# curl --insecure https://10.2.21.xx:42002

MySQL 3-in-1 exporter

MySQL 3-in-1 exporter

  • high-res metrics
  • medium-res metrics
  • low-res metrics
  • So it works. If you go to /prometheus/targets page on the server, what do you see?
    How much memory available on the server where docker runs and how many PMM clients do you have?

    Ive several times checked to pmm status and logs. The problems have irrupted with cpu 100% (prometheus process only), memory leak, socket increasing -> server hang when using t2.medium/t2.large ec2 on AWS env. when i use 1.0,4 version, it was enough.
    when starting containers and pmm-admin , /prometheus/targets all up status, after a bit, it changed DOWN state all. pmm web Is no longer available.

    ec2 intance : 1
    docker images : 2
    docker container : 2
    pmm-client : 1
    metrics/query : 20
    account limit: 10 connections.

    2016/12/18 00:11:00.235457 analyzer.go:426: qan-analyzer-9117f541-worker crashed: ‘61 2016-12-17 15:10:00 UTC to 2016-12-17 15:11:00 UTC (0-0)’: runtime error: invalid memory address or nil pointer dereference
    goroutine 3402694 [running]:
    runtime/debug.Stack(0x4868ec, 0xc42000e0f0, 0x2)
    /usr/local/go1.7.4/src/runtime/debug/stack.go:24 +0x79
    runtime/debug.PrintStack()
    /usr/local/go1.7.4/src/runtime/debug/stack.go:16 +0x22
    github.com/percona/qan-agent/qan.(*RealAnalyzer).runWorker.func1(0xc42018a000, 0xc420b566c0)
    /mnt/workspace/pmm-client-tarball/pmm-client-1.0.7/src/github.com/percona/qan-agent/qan/analyzer.go:427 +0x1f6
    panic(0x717600, 0xc42000c060)

    It is very strange, it seems to me the underlying environment is very unstable.
    It can be memory ballooning when one instance takes over resources from other which is usually a case on shared environments w/o resource reservation.
    Can you try an instance with guaranteed amount of resources?
    For PMM, IO is not that important, it is more CPU/memory sensitive.