Pmm server going in unhealthy state

krkumar · March 10, 2024, 11:18pm

Description:

I am installing pmm server version 2.39 on Google Cloud compute instance where I don’t have connectivity to the internet so I am using following steps (Please check Steps to Reproduce: section) to transfer pmm server docker image to the GCE node and then using the same image to start pmm server.
After all the installation my pmm server is going in unhealty state.
When I run supervisorctl status then I can see that clickhouse, grafana, pmm-update-perform-init and qan-api2 showing the FATAL status.
Can you please help me how to resolve this issue?

docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c2b23e8a52a percona/pmm-server:2.39 “/opt/entrypoint.sh” 2 days ago Up 2 days (unhealthy) 80/tcp, 0.0.0.0:443->443/tcp pmm-server
92e6643a8e6d percona/pmm-server:2.39 “/bin/true 1” 2 days ago Exited (0) 2 days ago pmm-data

Steps to Reproduce:

docker pull percona/pmm-server:2.39
docker save -o pmm-server.docker percona/pmm-server:2.39
scp docker file to pmm server in google compute node
yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin
install docker on pmm server node:
docker load -i pmm-server.docker
rpm -qa|grep -i docker
docker-ce-19.03.9-3.el7.x86_64
docker-ce-cli-19.03.9-3.el7.x86_64

mkdir /etc/docker
echo ‘{ “data-root”: “/liveperson/data/docker” }’ >> /etc/docker/daemon.json
disable selinux
docker create -v /srv/ --name pmm-data percona/pmm-server:2.39 /bin/true 1
Create pmm-server container:

docker run -d -p 443:443 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:2.39

docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c2b23e8a52a percona/pmm-server:2.39 “/opt/entrypoint.sh” 2 days ago Up 2 days (unhealthy) 80/tcp, 0.0.0.0:443->443/tcp pmm-server
92e6643a8e6d percona/pmm-server:2.39 “/bin/true 1” 2 days ago Exited (0) 2 days ago pmm-data

docker exec -it pmm-server supervisorctl status
alertmanager RUNNING pid 24, uptime 2 days, 1:47:10
clickhouse FATAL Exited too quickly (process log may have details)
dbaas-controller STOPPED Not started
grafana FATAL Exited too quickly (process log may have details)
nginx RUNNING pid 21, uptime 2 days, 1:47:10
pmm-agent RUNNING pid 28, uptime 2 days, 1:47:10
pmm-managed RUNNING pid 27, uptime 2 days, 1:47:10
pmm-update-perform STOPPED Not started
pmm-update-perform-init FATAL Exited too quickly (process log may have details)
postgresql RUNNING pid 12, uptime 2 days, 1:47:10
prometheus STOPPED Not started
qan-api2 BACKOFF Exited too quickly (process log may have details)
victoriametrics RUNNING pid 22, uptime 2 days, 1:47:10
vmalert RUNNING pid 23, uptime 2 days, 1:47:10
vmproxy RUNNING pid 25, uptime 2 days, 1:47:10

Version:

2.39

Logs:

Logs from clickhouse:
Processing configuration file ‘/etc/clickhouse-server/config.xml’.
Logging information to console
Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Could not determine local time zone: filesystem error: in canonical: Operation not permitted [/usr/share/zoneinfo/] [“”], Stack trace (when copying this message, always include the lines below):

DateLUT::DateLUT() @ 0x11d20884 in /usr/bin/clickhouse
BaseDaemon::setupWatchdog() @ 0x86c4b67 in /usr/bin/clickhouse
BaseDaemon::initialize(Poco::Util::Application&) @ 0x86c20ef in /usr/bin/clickhouse
DB::Server::initialize(Poco::Util::Application&) @ 0x857bd20 in /usr/bin/clickhouse
Poco::Util::Application::run() @ 0x11dab846 in /usr/bin/clickhouse
DB::Server::run() @ 0x857bb8f in /usr/bin/clickhouse
mainEntryClickHouseServer(int, char**) @ 0x857a7f5 in /usr/bin/clickhouse
main @ 0x850923e in /usr/bin/clickhouse
__libc_start_call_main @ 0x3feb0 in /usr/lib64/libc.so.6
__libc_start_main_alias_2 @ 0x3ff60 in /usr/lib64/libc.so.6
_start @ 0x84d37ae in /usr/bin/clickhouse
(version 21.3.20.1 (official build))

Logs from grafana:

runtime/cgo: pthread_create failed: Operation not permitted
SIGABRT: abort
PC=0x7f48d7c3258c m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: g 0: unknown pc 0x7f48d7c3258c
stack: frame={sp:0x7ffd9b5111b0, fp:0x0} stack=[0x7ffd9ad12730,0x7ffd9b511740)
0x00007ffd9b5110b0: 0x0000000000000001 0x0000000000000000
0x00007ffd9b5110c0: 0x00007f48d7ba4e90 0x00007f48d7c42e40

Logs from pmm-update-perform-init.log
PLAY [localhost] ***************************************************************
ERROR! Unexpected Exception, this is probably a bug: can’t start new thread
to see the full traceback, use -vvv
time=“2024-03-10T23:09:06Z” level=fatal msg=“RunPlaybook failed: exit status 250”
ProjectName: pmm-update
Version: 2.39.0
PMMVersion: 2.39.0
Timestamp: 2023-08-10 10:05:17 (UTC)
FullCommit: a657accbb0fb96f0a099218efd4bfecc97eb216e
Starting “ansible-playbook --flush-cache /usr/share/pmm-update/ansible/playbook/tasks/init.yml” …
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match ‘all’

Logs from qan-api2:

stdlog: qan-api2 v2.39.0.
ESC[36mINFOESC[0m[2024-03-10T23:12:35.666+00:00] Log level: info.
ESC[36mINFOESC[0m[2024-03-10T23:12:35.666+00:00] DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2 ESC[36mcomponentESC[0m=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.39.0.
ESC[36mINFOESC[0m[2024-03-10T23:12:57.232+00:00] Log level: info.
ESC[36mINFOESC[0m[2024-03-10T23:12:57.232+00:00] DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2 ESC[36mcomponentESC[0m=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused

Expected Result:

docker ps -a should show healthy status of pmm server

Actual Result:

Unhealthy status of pmm server
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c2b23e8a52a percona/pmm-server:2.39 “/opt/entrypoint.sh” 2 days ago Up 2 days (unhealthy) 80/tcp, 0.0.0.0:443->443/tcp pmm-server

Additional Information:

Installing pmm server on Linux
Linux hostname 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

nurlan · March 11, 2024, 9:08am

Hello @krkumar,
As far as I see you are trying to setup our EL9 based PMM docker image on Centos7 host. Unfortunately we don’t support this setup.
I have a few suggestion for you:

Upgrade Host system to RHEL8 or RHEL9
Try to use our latest version, we’ve updated Clickhouse, so it might somehow work on Centos7
Use our EL7 based images, they have -el7 suffix in our docker repository.

Have a good day.

krkumar · March 12, 2024, 3:43am

Hi @nurlan
Thanks for response.
I have my QA environment where I have internet access and on the VM where I am installing pmm server 2.39 version has centOS 7 which is same for the host where i am having issue.
I was checking the image version on the QA node where pmm server 2.39 working fine and has following details, this shows that this is docker image of rhel 9 running on VM with rhel 7.
Below data is from the VM where pmm server 2.39 working fine and i used curl -fsSL https://www.percona.com/get/pmm | /bin/bash command to install pmm server. As you know this works fine because we have internet connectivity here.

[root@031bccae67c0 opt] # cat /etc/os-release
NAME=“Oracle Linux Server”
VERSION=“9.2”
ID=“ol”
ID_LIKE=“fedora”
VARIANT=“Server”
VARIANT_ID=“server”
VERSION_ID=“9.2”
PLATFORM_ID=“platform:el9”
PRETTY_NAME=“Oracle Linux Server 9.2”
ANSI_COLOR=“0;31”
CPE_NAME=“cpe:/o:oracle:linux:9:2:server”
HOME_URL=“https://linux.oracle.com/”
BUG_REPORT_URL=“GitHub - oracle/oracle-linux: Scripts, examples, and tutorials to get started with Oracle Linux”

ORACLE_BUGZILLA_PRODUCT=“Oracle Linux 9”
ORACLE_BUGZILLA_PRODUCT_VERSION=9.2
ORACLE_SUPPORT_PRODUCT=“Oracle Linux”
ORACLE_SUPPORT_PRODUCT_VERSION=9.2

do we have pmm server image for rhel 7 for pmm server 2.39 version?

krkumar · March 13, 2024, 9:27pm

@nurlan
I was able to resolve it by installing all the latest docker and container related rpms.
yum install containerd.io-1.6.27-3.1.el7.x86_64.rpm docker-buildx-plugin-0.11.2-1.el7.x86_64.rpm docker-ce-cli-24.0.7-1.el7.x86_64.rpm docker-compose-plugin-2.21.0-1.el7.x86_64.rpm
container-selinux-2.119.2-1.911c772.el7_8.noarch.rpm docker-ce-24.0.7-1.el7.x86_64.rpm docker-ce-rootless-extras-24.0.7-1.el7.x86_64.rpm

Can you please let me know on what all ports i need to open Firewall so that my db nodes can communicate to pmm server and from pmm server to my db nodes?
Do i need only 443 port or i need other ports to where pmm agent, mysqld & node exporters runs?

Thanks

nurlan · March 19, 2024, 10:45am

Yes, only 443 needs to be open. We recommend installing PMM Client on DB instances to collect metrics locally and push to PMM Server, so you don’t have to open ports to your DBs to the world.

Topic		Replies	Views
PMM installation troubles PMM 2.x	1	514	March 21, 2024
PMM 2.7.0 in docker container reports unhealthy after upgrade to CentOS 8.2.2004 PMM 2.x	1	847	June 21, 2020
PMM 2 docker not starting PMM 2.x	5	2463	September 29, 2021
After configuring proxy PMM-Server docker container becomes unhealthy PMM 2.x	5	1303	June 15, 2022
Could not start PMM server after update version from 1.5.3 to 1.8 PMM 1.x	6	2085	March 22, 2018