Description:
I am installing pmm server version 2.39 on Google Cloud compute instance where I don’t have connectivity to the internet so I am using following steps (Please check Steps to Reproduce: section) to transfer pmm server docker image to the GCE node and then using the same image to start pmm server.
After all the installation my pmm server is going in unhealty state.
When I run supervisorctl status then I can see that clickhouse, grafana, pmm-update-perform-init and qan-api2 showing the FATAL status.
Can you please help me how to resolve this issue?
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c2b23e8a52a percona/pmm-server:2.39 “/opt/entrypoint.sh” 2 days ago Up 2 days (unhealthy) 80/tcp, 0.0.0.0:443->443/tcp pmm-server
92e6643a8e6d percona/pmm-server:2.39 “/bin/true 1” 2 days ago Exited (0) 2 days ago pmm-data
Steps to Reproduce:
docker pull percona/pmm-server:2.39
docker save -o pmm-server.docker percona/pmm-server:2.39
scp docker file to pmm server in google compute node
yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin
install docker on pmm server node:
docker load -i pmm-server.docker
rpm -qa|grep -i docker
docker-ce-19.03.9-3.el7.x86_64
docker-ce-cli-19.03.9-3.el7.x86_64
mkdir /etc/docker
echo ‘{ “data-root”: “/liveperson/data/docker” }’ >> /etc/docker/daemon.json
disable selinux
docker create -v /srv/ --name pmm-data percona/pmm-server:2.39 /bin/true 1
Create pmm-server container:
docker run -d -p 443:443 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:2.39
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c2b23e8a52a percona/pmm-server:2.39 “/opt/entrypoint.sh” 2 days ago Up 2 days (unhealthy) 80/tcp, 0.0.0.0:443->443/tcp pmm-server
92e6643a8e6d percona/pmm-server:2.39 “/bin/true 1” 2 days ago Exited (0) 2 days ago pmm-data
docker exec -it pmm-server supervisorctl status
alertmanager RUNNING pid 24, uptime 2 days, 1:47:10
clickhouse FATAL Exited too quickly (process log may have details)
dbaas-controller STOPPED Not started
grafana FATAL Exited too quickly (process log may have details)
nginx RUNNING pid 21, uptime 2 days, 1:47:10
pmm-agent RUNNING pid 28, uptime 2 days, 1:47:10
pmm-managed RUNNING pid 27, uptime 2 days, 1:47:10
pmm-update-perform STOPPED Not started
pmm-update-perform-init FATAL Exited too quickly (process log may have details)
postgresql RUNNING pid 12, uptime 2 days, 1:47:10
prometheus STOPPED Not started
qan-api2 BACKOFF Exited too quickly (process log may have details)
victoriametrics RUNNING pid 22, uptime 2 days, 1:47:10
vmalert RUNNING pid 23, uptime 2 days, 1:47:10
vmproxy RUNNING pid 25, uptime 2 days, 1:47:10
Version:
2.39
Logs:
Logs from clickhouse:
Processing configuration file ‘/etc/clickhouse-server/config.xml’.
Logging information to console
Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Could not determine local time zone: filesystem error: in canonical: Operation not permitted [/usr/share/zoneinfo/] [“”], Stack trace (when copying this message, always include the lines below):
- DateLUT::DateLUT() @ 0x11d20884 in /usr/bin/clickhouse
- BaseDaemon::setupWatchdog() @ 0x86c4b67 in /usr/bin/clickhouse
- BaseDaemon::initialize(Poco::Util::Application&) @ 0x86c20ef in /usr/bin/clickhouse
- DB::Server::initialize(Poco::Util::Application&) @ 0x857bd20 in /usr/bin/clickhouse
- Poco::Util::Application::run() @ 0x11dab846 in /usr/bin/clickhouse
- DB::Server::run() @ 0x857bb8f in /usr/bin/clickhouse
- mainEntryClickHouseServer(int, char**) @ 0x857a7f5 in /usr/bin/clickhouse
- main @ 0x850923e in /usr/bin/clickhouse
- __libc_start_call_main @ 0x3feb0 in /usr/lib64/libc.so.6
- __libc_start_main_alias_2 @ 0x3ff60 in /usr/lib64/libc.so.6
- _start @ 0x84d37ae in /usr/bin/clickhouse
(version 21.3.20.1 (official build))
Logs from grafana:
runtime/cgo: pthread_create failed: Operation not permitted
SIGABRT: abort
PC=0x7f48d7c3258c m=0 sigcode=18446744073709551610
goroutine 0 [idle]:
runtime: g 0: unknown pc 0x7f48d7c3258c
stack: frame={sp:0x7ffd9b5111b0, fp:0x0} stack=[0x7ffd9ad12730,0x7ffd9b511740)
0x00007ffd9b5110b0: 0x0000000000000001 0x0000000000000000
0x00007ffd9b5110c0: 0x00007f48d7ba4e90 0x00007f48d7c42e40
Logs from pmm-update-perform-init.log
PLAY [localhost] ***************************************************************
ERROR! Unexpected Exception, this is probably a bug: can’t start new thread
to see the full traceback, use -vvv
time=“2024-03-10T23:09:06Z” level=fatal msg=“RunPlaybook failed: exit status 250”
ProjectName: pmm-update
Version: 2.39.0
PMMVersion: 2.39.0
Timestamp: 2023-08-10 10:05:17 (UTC)
FullCommit: a657accbb0fb96f0a099218efd4bfecc97eb216e
Starting “ansible-playbook --flush-cache /usr/share/pmm-update/ansible/playbook/tasks/init.yml” …
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match ‘all’
Logs from qan-api2:
stdlog: qan-api2 v2.39.0.
ESC[36mINFOESC[0m[2024-03-10T23:12:35.666+00:00] Log level: info.
ESC[36mINFOESC[0m[2024-03-10T23:12:35.666+00:00] DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2 ESC[36mcomponentESC[0m=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.39.0.
ESC[36mINFOESC[0m[2024-03-10T23:12:57.232+00:00] Log level: info.
ESC[36mINFOESC[0m[2024-03-10T23:12:57.232+00:00] DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2 ESC[36mcomponentESC[0m=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
Expected Result:
docker ps -a should show healthy status of pmm server
Actual Result:
Unhealthy status of pmm server
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c2b23e8a52a percona/pmm-server:2.39 “/opt/entrypoint.sh” 2 days ago Up 2 days (unhealthy) 80/tcp, 0.0.0.0:443->443/tcp pmm-server
Additional Information:
Installing pmm server on Linux
Linux hostname 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux