Not the answer you need?
Register and ask your own question!

pmm instance down ?

scar2yjsscar2yjs ContributorCurrent User Role Beginner
hello,

currnetly our pmm ec2 instance has been crashed ... because too many open tcp sockets ? ... what's the problem?



remaining too many logs..
2016/12/15 10:24:41 http: Accept error: accept tcp 172.17.0.1:42002: accept4: too many open files; retrying in 5ms
2016/12/15 10:24:41 http: Accept error: accept tcp 172.17.0.1:42002: accept4: too many open files; retrying in 10ms
2016/12/15 10:24:42 http: Accept error: accept tcp 172.17.0.1:42002: accept4: too many open files; retrying in 20ms
2016/12/15 10:24:43 http: TLS handshake error from 172.17.0.2:59002: write tcp 172.17.0.1:42002->172.17.0.2:59002: write: broken pipe
2016/12/15 10:24:43 http: TLS handshake error from 172.17.0.2:48450: EOF
2016/12/15 10:24:44 http: TLS handshake error from 172.17.0.2:60626: write tcp 172.17.0.1:42002->172.17.0.2:60626: write: broken pipe

thanks : (

Comments

  • scar2yjsscar2yjs Contributor Current User Role Beginner
    [[email protected] log]# docker version
    Client:
    Version: 1.12.2
    API version: 1.24
    Go version: go1.6.3
    Git commit: bb80604
    Built:
    OS/Arch: linux/amd64

    Server:
    Version: 1.12.2
    API version: 1.24
    Go version: go1.6.3
    Git commit: bb80604
    Built:
    OS/Arch: linux/amd64



    REPOSITORY TAG IMAGE ID CREATED SIZE
    percona/pmm-server 1.0.7 a91f4f6237a9 5 days ago 714.4 MB
    percona/pmm-server latest 0eade99a1612 8 weeks ago 652.9 MB


    [[email protected] log]# pmm-admin -v
    1.0.7
  • scar2yjsscar2yjs Contributor Current User Role Beginner
    # pmm-admin check-network















    SERVICE TYPE
    REMOTE ENDPOINT
    STATUS
    HTTPS/TLS
    PASSWORD














    mysql:metrics
    172.17.0.1:42002
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42003
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42004
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42005
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42006
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42007
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42008
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42009
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42010
    DOWN
    YES
    -


    mysql:metrics
    172.17.0.1:42011
    DOWN
    YES
    -








  • weberweber Advisor Inactive User Role Beginner
    Are you saying PMM caused "too many open tcp sockets" problem?
    Do you have netstat stats from that?
  • scar2yjsscar2yjs Contributor Current User Role Beginner
    1. the sockets continues to increase and server becomes unavailable.
    2. tls errors and down status.
    3, prometheus/targets -> Get http://localhost:9100/metrics: dial tcp [::1]:9100: i/o timeout
    4. if i access the endpoint using curl then i can see ssl error.


    netstat
    tcp 132 0 172.17.0.1:42003 172.17.0.2:35546 ESTABLISHED off (0.00/0/0)
    tcp 0 0 172.17.0.1:42010 172.17.0.2:46726 ESTABLISHED keepalive (73.38/0/0)
    tcp 0 0 172.17.0.1:42002 172.17.0.2:41970 ESTABLISHED keepalive (30.89/0/0)
    tcp 132 0 172.17.0.1:42006 172.17.0.2:43960 ESTABLISHED off (0.00/0/0)
    tcp 0 0 172.17.0.1:42011 172.17.0.2:56352 ESTABLISHED keepalive (87.72/0/0)
    tcp 0 0 172.17.0.1:42003 172.17.0.2:53234 ESTABLISHED keepalive (120.49/0/0)
    tcp 132 0 172.17.0.1:42005 172.17.0.2:47648 ESTABLISHED off (0.00/0/0)
    tcp 0 0 172.17.0.1:42007 172.17.0.2:33106 ESTABLISHED keepalive (176.81/0/0)
    tcp 0 0 172.17.0.1:42007 172.17.0.2:36374 ESTABLISHED keepalive (99.49/0/0)
    tcp 0 0 172.17.0.1:42002 172.17.0.2:47632 ESTABLISHED keepalive (81.06/0/0)
    tcp 132 0 172.17.0.1:42009 172.17.0.2:46248 ESTABLISHED off (0.00/0/0)
    tcp 132 0 172.17.0.1:42008 172.17.0.2:32788 ESTABLISHED off (0.00/0/0)
  • weberweber Advisor Inactive User Role Beginner
    Are you using internal docker IPs to communicate between server and client?

    Looks like client address is 172.17.0.1
    Can you connect from the inside the container to 172.17.0.1 on any service port, e.g. 42002?

    Client address should be set to underlying host private ip. Internal docker ips may not work.
  • scar2yjsscar2yjs Contributor Current User Role Beginner
    thanks weber,

    i think that pmm-admin address/name are automatically setting.
    im try to change the info. but everything not changed .

    // ========================== //
    pmm-admin config --bind-address 10.2.21.65
    pmm-admin config --client-address 10.2.21.65

    # pmm-admin info
    pmm-admin 1.0.7

    PMM Server | localhost
    Client Name | ip-10-2-21-65
    Client Address | 172.17.0.1
    Service Manager | linux-systemd

    Go Version | 1.7.4
    Runtime Info | linux/amd64

    // ============================ //
    # telnet 172.17.0.1 42002
    Trying 172.17.0.1...
    Connected to 172.17.0.1.
    Escape character is '^]'.


    #curl https://172.17.0.1:42002/metrics-hr
    curl: (60) Issuer certificate is invalid.
    More details here: http://curl.haxx.se/docs/sslcerts.html

    curl performs SSL certificate verification by default, using a "bundle"
    of Certificate Authority (CA) public keys (CA certs). If the default
    bundle file isn't adequate, you can specify an alternate file
    using the --cacert option.

    # curl -i http://172.17.0.1:42002
  • scar2yjsscar2yjs Contributor Current User Role Beginner
    first of all, re-config(reset) of pmm-admin config --options is not changed anything btw server. eg, pmm-admin config --server xxxxx --client-address xxxx .
    so i re try to remove/reinstall containers and pmm-client on 4 ec2 instances all, and i can see correct pmm-admin info.

    [[email protected] source]# pmm-admin check-network
    PMM Network Status

    Server Address | 10.2.21.xx
    Client Address | 10.2.21.xx

    * Connection: Client <-- Server





    SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD





    mysql:metrics maindb01 10.2.21.xx:42002 DOWN YES -

    still i can not solve Client <-- Server Down Remote endpoint status. and not have any deny firewall.
  • weberweber Advisor Inactive User Role Beginner
    This command should work from the inside of container:
    docker exec -ti pmm-server bash
    curl --insecure https://10.2.21.xx:42002
  • scar2yjsscar2yjs Contributor Current User Role Beginner
    check below ..
    # docker exec -ti pmm-server-df bash
    [email protected]:/opt# curl --insecure https://10.2.21.xx:42002
    <html>
    <head><title>MySQL 3-in-1 exporter</title></head>
    <body>
    <h1>MySQL 3-in-1 exporter</h1>
    <li><a href="/metrics-hr">high-res metrics</a></li>
    <li><a href="/metrics-mr">medium-res metrics</a></li>
    <li><a href="/metrics-lr">low-res metrics</a></li>
    </body>
    </html>
  • weberweber Advisor Inactive User Role Beginner
    So it works. If you go to /prometheus/targets page on the server, what do you see?
    How much memory available on the server where docker runs and how many PMM clients do you have?
  • scar2yjsscar2yjs Contributor Current User Role Beginner
    Ive several times checked to pmm status and logs. The problems have irrupted with cpu 100% (prometheus process only), memory leak, socket increasing -> server hang when using t2.medium/t2.large ec2 on AWS env. when i use 1.0,4 version, it was enough.
    when starting containers and pmm-admin , /prometheus/targets all up status, after a bit, it changed DOWN state all. pmm web Is no longer available.

    ec2 intance : 1
    docker images : 2
    docker container : 2
    pmm-client : 1
    metrics/query : 20
    account limit: 10 connections.

    2016/12/18 00:11:00.235457 analyzer.go:426: qan-analyzer-9117f541-worker crashed: '61 2016-12-17 15:10:00 UTC to 2016-12-17 15:11:00 UTC (0-0)': runtime error: invalid memory address or nil pointer dereference
    goroutine 3402694 [running]:
    runtime/debug.Stack(0x4868ec, 0xc42000e0f0, 0x2)
    /usr/local/go1.7.4/src/runtime/debug/stack.go:24 +0x79
    runtime/debug.PrintStack()
    /usr/local/go1.7.4/src/runtime/debug/stack.go:16 +0x22
    github.com/percona/qan-agent/qan.(*RealAnalyzer).runWorker.func1(0xc42018a000, 0xc420b566c0)
    /mnt/workspace/pmm-client-tarball/pmm-client-1.0.7/src/github.com/percona/qan-agent/qan/analyzer.go:427 +0x1f6
    panic(0x717600, 0xc42000c060)
  • weberweber Advisor Inactive User Role Beginner
    It is very strange, it seems to me the underlying environment is very unstable.
    It can be memory ballooning when one instance takes over resources from other which is usually a case on shared environments w/o resource reservation.
    Can you try an instance with guaranteed amount of resources?
    For PMM, IO is not that important, it is more CPU/memory sensitive.
This discussion has been closed.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.