PMM 2 docker not starting

HI ,

We setup PMM monitoring tool as docker and we are unable to start the docker

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
524ec5709522 percona/pmm-server:2.10.1 “/opt/entrypoint.sh” 3 months ago Exited (128) 47 minutes ago pmm-server

Below is the error we got

docker start pmm-server
Error response from daemon: chown /var/lib/docker/overlay2/478d094b5ed8e7e7290f7444a272f9bbc436e921d3b8da7bbd0de6ec1886c6d7/work/work: no such file or directory
Error: failed to start containers: pmm-server

can you please help us how to troubleshoot and repair

Wow…2.10.1 is a very old version…been a TON of cool features and fixes added since then you might consider upgrading once you get it squared away…but to get it squared away…

you can see if docker logs pmm-server gives anything of use…never does for me but maybe this time :wink:

The error is pointing to the overlay’s directory so I’m kinda wondering if someone did some sort of prune while the container was stopped that deleted something used in startup? does that directory exist on your FS (sounds like no according to system) how far up the tree does exist?

Did you use a pmm-data container (check docker ps -a) to see if there’s a data volume or look at docker inspect pmm-server | grep -A 2 VolumesFrom…if so you can follow the instructions for taking a backup and go with a newer version of the pmm-server container.

1 Like

HI @steve.hoffman
Thanks for your response.

We tried to upgrade to the latest version but unfortunately, Mysql azure servers ended up with SSL issues.

Now deleted the latest version and installed 2.14.0.

Now QAN service and click house not starting

Below is the status

[root@8c31b038c6c9 logs]# supervisorctl status
alertmanager RUNNING pid 153, uptime 0:14:29
clickhouse FATAL Exited too quickly (process log may have details)
cron RUNNING pid 19, uptime 0:14:31
dashboard-upgrade EXITED Sep 29 03:51 AM
dbaas-controller STOPPED Not started
grafana RUNNING pid 928, uptime 0:13:56
nginx RUNNING pid 18, uptime 0:14:31
pmm-agent RUNNING pid 37, uptime 0:14:31
pmm-managed RUNNING pid 34, uptime 0:14:31
pmm-update-perform STOPPED Not started
postgresql RUNNING pid 15, uptime 0:14:31
prometheus STOPPED Not started
qan-api2 BACKOFF Exited too quickly (process log may have details)
victoriametrics RUNNING pid 119, uptime 0:14:30
vmalert RUNNING pid 25, uptime 0:14:31

Below is the qan agent error logs

stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:00:44.653+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:01:18.757+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:01:53.838+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:02:30.036+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:03:07.315+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:03:45.448+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:04:24.655+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:05:04.715+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:05:45.759+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.14.0.
INFO[2021-09-29T04:06:28.126+00:00] Log level: info.
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused

Docker logs

2021-09-29 03:54:51,234 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:55:12,288 INFO spawned: ‘qan-api2’ with pid 1826
2021-09-29 03:55:12,332 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:55:34,623 INFO spawned: ‘qan-api2’ with pid 1900
2021-09-29 03:55:34,669 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:55:58,080 INFO spawned: ‘qan-api2’ with pid 1969
2021-09-29 03:55:58,126 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:56:22,317 INFO spawned: ‘qan-api2’ with pid 2081
2021-09-29 03:56:22,358 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:56:47,102 INFO spawned: ‘qan-api2’ with pid 2159
2021-09-29 03:56:47,147 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:57:13,168 INFO spawned: ‘qan-api2’ with pid 2270
2021-09-29 03:57:13,213 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:57:40,222 INFO spawned: ‘qan-api2’ with pid 2355
2021-09-29 03:57:40,261 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:58:08,344 INFO spawned: ‘qan-api2’ with pid 2465
2021-09-29 03:58:08,386 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:58:37,500 INFO spawned: ‘qan-api2’ with pid 2562
2021-09-29 03:58:37,543 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:59:08,081 INFO spawned: ‘qan-api2’ with pid 2681
2021-09-29 03:59:08,127 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 03:59:39,142 INFO spawned: ‘qan-api2’ with pid 2771
2021-09-29 03:59:39,185 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:00:11,288 INFO spawned: ‘qan-api2’ with pid 2898
2021-09-29 04:00:11,329 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:00:44,623 INFO spawned: ‘qan-api2’ with pid 2994
2021-09-29 04:00:44,660 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:01:18,712 INFO spawned: ‘qan-api2’ with pid 3138
2021-09-29 04:01:18,758 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:01:53,791 INFO spawned: ‘qan-api2’ with pid 3246
2021-09-29 04:01:53,840 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:02:29,992 INFO spawned: ‘qan-api2’ with pid 3377
2021-09-29 04:02:30,037 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:03:07,281 INFO spawned: ‘qan-api2’ with pid 3510
2021-09-29 04:03:07,316 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:03:45,411 INFO spawned: ‘qan-api2’ with pid 3623
2021-09-29 04:03:45,450 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:04:24,619 INFO spawned: ‘qan-api2’ with pid 3765
2021-09-29 04:04:24,658 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:05:04,675 INFO spawned: ‘qan-api2’ with pid 3905
2021-09-29 04:05:04,717 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:05:45,719 INFO spawned: ‘qan-api2’ with pid 4019
2021-09-29 04:05:45,761 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:06:28,081 INFO spawned: ‘qan-api2’ with pid 4172
2021-09-29 04:06:28,127 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-29 04:07:11,187 INFO spawned: ‘qan-api2’ with pid 4322
2021-09-29 04:07:11,229 INFO exited: qan-api2 (exit status 1; not expected)

AttachedProcessing: pmm-server_2021-09-29_04-08.zip…
PMM server logs

can you please assist us with what went wrong

1 Like

Did you restore your backup as part of rolling back? It appears when you went to 2.21.0 there were upgrade routines that run against the clickhouse database (which is what QAN data is stored in) and going backwards in version without restoring the data means you have a 2.21.0 clickhouse datamodel but 2.14.0 logic which means clickhouse can’t start and thus the QAN api can’t start as it can’t connect to clickhouse.

1 Like

HI @steve.hoffman ,

Thanks for the reply.

I deleted the pmm-server app docker and again redeployed using the below command.

docker run --detach --restart always --publish 443:443 --volumes-from pmm-data --name pmm-server percona/pmm-server:2.14.0

I didn’t delete the docker data volume as I configured custom alerts for 50 servers and will be the rework If I delete and create data volume.

can you please guide me through the steps of downgrading click house to 2.14.0

I tried reinstalling with the below commands but still it’s not working

docker exec -it pmm-server bash

yum -y remove percona-qan-api2

yum -y install percona-qan-api2-2.14.0

supervisorctl restart qan-api2

Thanks
Srinivas

1 Like

HI @steve.hoffman ,

Finally, I figured it out.

I dropped pmm-server and installed pmm 2.15 and when I checked QAn logs it gives the below error

2021.09.29 19:40:31.962534 [ 30 ] {} TCPHandler: Code: 81, e.displayText() = DB::Exception: Database pmm doesn’t exist, Stack trace:

  1. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x16) [0x6424b76]
  2. /usr/bin/clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0x2fa4a9f]
  3. /usr/bin/clickhouse-server(DB::TCPHandler::runImpl()+0x98d) [0x2faec7d]
  4. /usr/bin/clickhouse-server(DB::TCPHandler::run()+0x1c) [0x2faf87c]
  5. /usr/bin/clickhouse-server(Poco::Net::TCPServerConnection::start()+0xf) [0x663383f]
  6. /usr/bin/clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x110) [0x6633ea0]
  7. /usr/bin/clickhouse-server(Poco::PooledThread::run()+0x77) [0x671b427]
  8. /usr/bin/clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x38) [0x6718988]
  9. /usr/bin/clickhouse-server() [0x6e3a78f]
  10. /lib64/libpthread.so.0(+0x7ea5) [0x7f650adacea5]
  11. /lib64/libc.so.6(clone+0x6d) [0x7f650a0ad9fd]

So I logged in to the click house server and manually created pmm database and finally, the QAN service started.

It looks like for me a bug in the upgrade scripts

Thanks
Srinivas

2 Likes