Pmm-agent can not connect to pmm-server when using reverse proxy

What I’m trying to achieve:

  • Run pmm server in docker container on "server", expose port 443 as 127.0.0.1:8443
  • Use nginx as reverse proxy for 127.0.0.1:8443. This way nginx can handle TLS using certbot and also serve other domains.
  • Connect pmm-agent running on "client" to "server"

What I’ve done:

deployed pmm-server image on docker like this:

docker run --detach --restart always --publish 127.0.0.1:8443:443 --volumes-from pmm-data --name pmm-server percona/pmm-server:2

configured nginx site:

server {
    server_name server.domain;
    location / {
        include /etc/nginx/proxy_params;
        proxy_pass https://127.0.0.1:8443;
    }

    listen [::]:443 ssl ipv6only=on; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/server.domain/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/server.domain/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}
server {
    if ($host = server.domain) {
        return 301 https://$host$request_uri;
    } # managed by Certbot
    listen 80;
    listen [::]:80;
    server_name server.domain;
    return 404; # managed by Certbot
}

proxy_params:

proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

configured client:

pmm-admin config --server-url=https://user:pass@server.domain:443 server_ip generic server

What happens:

pmm-admin status:

"Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server."

pmm-agent --debug:

ERRO[2020-12-23T12:33:12.822+01:00] Failed to connect to server.domain:443: timeout.  component=client
INFO[2020-12-23T12:33:13.605+01:00] Connecting to https://user:***@server.domain:443/ ...  component=client

What I’ve tried:

If i disable nginx and publish docker on 0.0.0.0:443->443 and use --server-insecure-tls, the pmm-agent connects without any problem.

Hi Andrej,

Can you telnet to “server.domain:443” to see if it actually connects ? This does not look like PMM problem but some problem in the configuration, meaning what connections to server.domain:443 fail. It may be Nginx is not configured properly or might be there is firewall in between ?

1 Like

have you configured your pmm server for a custom cert? You’ve got multiple layers of encryption going on in this double proxy config (external nginx and pmm’s nginx). So assuming you’ve left the default certs in place, you need to enable your external nginx to trust the upstream cert or it’ll fail as you have to get your nginx to decrypt the pmm nginx traffic and then reencrypt with your let’s encrypt cert to serve the content to the client. I think you will need to preinstall the cert/key on your nginx in the “location” stanza. Just be aware this IS the definition of a “man in the middle” so be sure that this machine is secured as it’s a central point to sniff creds.

here is the docs for various options for an upstream service that is encrypted using https.

edit (forgot link)

1 Like

Can you telnet to “server.domain:443” to see if it actually connects ? This does not look like PMM problem but some problem in the configuration, meaning what connections to server.domain:443 fail. It may be Nginx is not configured properly or might be there is firewall in between ?

Hi Peter,

when connecting to server.domain from my PC I can see graphana, log in and see pmm-server stats no problem. Just to be sure that client can connect to server as well I tested it using telnet and curl and both can connect to server.domain. However I noticed that I can’t access /prometheus but can access /graph and /qan without any problem. When trying to access /prometheus from my PC after logging into graphana I get this error message:

remoteAddr: "127.0.0.1:46560", X-Forwarded-For: "ip of my PC"; unsupported path requested: "/"

Also to address Steve’s comment: Nginx acepts self signed certificates due to the default value of proxy_ssl_verify being off. I’m aware of what this means and even though the pmm-server docker is running on same machine as my nginx reverse proxy and docker being only exposed on localhost, I’ll make sure to configure custom certs after I get this working.

1 Like

Are you running 2.12.0? In that version we replaced prometheus with VictoriaMetrics so the default /prometheus landing page no longer exists. We still have many of the endpoints that apply in this version (/prometheus/targets/, /prometheus/rules/) you can see all details here. but that only explains why /prometheus doesn’t work.

Still trying to think about why clients wouldn’t work the exact same way your manual tests do though…there’s nothing technically different about what you’ve done by hand and what the client does? My mind keeps going to timeout values between client <–> external nginx <–> and pmm nginx but feels like it should just reestablish itself but I’ve seen stranger things with apache’s reverse proxy and having to have harmony between all configurations.

Am I correct to assume the pmm-admin config command works successfully in registering the node but as soon as you run pmm-admin status it’s showing disconnected? is that an instant fail or after say 5, 15 or 100 (I think nginx default) seconds?

1 Like

Yes I’m running 2.12.0 so that explains that.

You are correctly assuming that pmm-admin config runs successfuly :

Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
Configuration reloaded.
Checking local pmm-agent status...
pmm-agent is running.

Running pmm-admin status shows this:

"Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server."

Running pmm-agent --debug --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml shows that the timeout happens exactly 5s after connection attempt:

INFO[2020-12-28T18:14:30.697+01:00] Connecting to https://user:***@server.domain:443/ ...  component=client

ERRO[2020-12-28T18:14:35.697+01:00] Failed to connect to server.domain:443: timeout.  component=client

INFO[2020-12-28T18:14:37.108+01:00] Connecting to https://user:***@server.domain:443/ ...  component=client

ERRO[2020-12-28T18:14:42.109+01:00] Failed to connect to server.domain:443: timeout.  component=client

INFO[2020-12-28T18:14:44.025+01:00] Connecting to https://user:***@server.domain:443/ ...  component=client

ERRO[2020-12-28T18:14:49.025+01:00] Failed to connect to server.domain:443: timeout.  component=client
1 Like

Hi @Andrej

We use gRPC to communicate between pmm-agent and PMM Server and reverse proxy may doesn’t work for that case.

Probably you should pass gRPC requests through grpc_pass like we do here https://github.com/percona/pmm-server/blob/PMM-2.0/nginx.conf#L160

list of locations which use gRPC

/agent.

/inventory.

/management.

/server.

1 Like

Hi @nurlanThanks for setting me on the right path.

Firstly I fixed the timeout issue by adding http2 parameter to listen directive like this:

listen 443 ssl http2;

After this change the client started showing this error:

DEBU[2020-12-29T13:27:04.528+01:00] Sending message (4 bytes): id:1  ping:{}.     component=channel
DEBU[2020-12-29T13:27:09.348+01:00] Closing with error: rpc error: code = Canceled desc = context canceled
failed to receive message
github.com/percona/pmm-agent/client/channel.(*Channel).runReceiver
        /tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:199
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373  component=channel
DEBU[2020-12-29T13:27:09.348+01:00] Exiting receiver goroutine.                   component=channel
ERRO[2020-12-29T13:27:09.348+01:00] Failed to establish two-way communication channel: context canceled.  component=client
DEBU[2020-12-29T13:27:09.348+01:00] Connection closed.                            component=client

So next I added:

upstream managed-grpc {
  server 127.0.0.1:8443;
  keepalive 32;
}

and

location /agent. {
    grpc_pass grpc://managed-grpc;
    client_max_body_size 0;
}
location /inventory. {
    grpc_pass grpc://managed-grpc;
}
location /management. {
    grpc_pass grpc://managed-grpc;
}
location /server. {
    grpc_pass grpc://managed-grpc;
}

This resulted in:

DEBU[2020-12-29T13:34:42.453+01:00] Sending message (4 bytes): id:1  ping:{}.     component=channel
DEBU[2020-12-29T13:34:42.495+01:00] Closing with error: rpc error: code = Unavailable desc = Bad Gateway: HTTP status code 502; transport: received the unexpected content-type "text/html"
failed to receive message
github.com/percona/pmm-agent/client/channel.(*Channel).runReceiver
        /tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:199
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373  component=channel
DEBU[2020-12-29T13:34:42.495+01:00] Exiting receiver goroutine.                   component=channel
ERRO[2020-12-29T13:34:42.495+01:00] Failed to establish two-way communication channel: Bad Gateway: HTTP status code 502; transport: received the unexpected content-type "text/html".  component=client
DEBU[2020-12-29T13:34:42.496+01:00] Connection closed.                            component=client

So currently I’m trying to figure out why grpc_pass fails with 502 status code. If I figure it out I’ll update you. If you have any ideas please let me know.

1 Like

Hi @Andrej,
could you try to replace grpc:// with grpcs://, please?

And if it doesn’t work please check this message Unable to setup proxy between pmm-client and server - #3 by jojojoseff

1 Like