I am having trouble trying to get metrics logging after the upgrade, I keep showing a 502 error on the graphs.
Running docker exec f8d2d0043747 supervisorctl status
I see some services not starting
alertmanager RUNNING pid 25, uptime 1:03:15
clickhouse RUNNING pid 15, uptime 1:03:15
dbaas-controller STOPPED Not started
grafana RUNNING pid 19, uptime 1:03:15
nginx RUNNING pid 20, uptime 1:03:15
pmm-agent RUNNING pid 33, uptime 1:03:15
pmm-managed RUNNING pid 29, uptime 1:03:15
pmm-update-perform STOPPED Not started
pmm-update-perform-init EXITED Jun 15 12:59 PM
postgresql RUNNING pid 14, uptime 1:03:15
prometheus STOPPED Not started
qan-api2 RUNNING pid 279, uptime 1:03:08
victoriametrics FATAL Exited too quickly (process log may have details)
vmalert RUNNING pid 22, uptime 1:03:15
vmproxy RUNNING pid 27, uptime 1:03:15
If I run the same command a second time I get
Error: error creating OCI runtime exit file path /var/lib/containers/storage/overlay-containers/f8d2d00437477197ab14a38fe3dafd2f42a046b30efcbe46fb286565d877a28e/userdata/38a46efc2e9bd7f9d9eed3c00f3f67304ff741559a22742c1ab4360a1d31002c/exit: mkdir /var/lib/containers/storage/overlay-containers/f8d2d00437477197ab14a38fe3dafd2f42a046b30efcbe46fb286565d877a28e/userdata/38a46efc2e9bd7f9d9eed3c00f3f67304ff741559a22742c1ab4360a1d31002c/exit: structure needs cleaning
Any idea what is going on here and how to resolve?
As per the service victoriametrics FATAL Exited too quickly (process log may have details)
Please check what error in /srv/logs/ victoriametrics log in side pmm-server container.
Also from which version pmm version you upgrade to 2.37.1 ?
Hi,
Did you check for the free space left on the machine where PMM is running? If there is enough disk memory, could you try to start the failed process with docker exec f8d2d0043747 supervisorctl start victoriametrics
?
I gave this a try and all I get is
victoriametrics: ERROR (spawn error)
Hello Lalit,
The error from the log shows
$
2023-06-16T09:08:10.029Z panic /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/app/vmselect/netstorage/tmp_blocks_file.go:26 FATAL: cannot create "/srv/victoria$
panic: FATAL: cannot create "/srv/victoriametrics/data/tmp/searchResults": mkdir /srv/victoriametrics/data/tmp/searchResults: structure needs cleaning
More log if it helps any?
2023-06-19T16:11:29.135Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:404 inmemory parts have been successfully flushed to files in 0.000 seconds at "/srv/victoriametrics/data/indexdb/17393B3636185011"
2023-06-19T16:11:29.135Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:406 waiting for flush callback worker to stop on "/srv/victoriametrics/data/indexdb/17393B3636185011"...
2023-06-19T16:11:29.135Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:409 flush callback worker stopped in 0.000 seconds on "/srv/victoriametrics/data/indexdb/17393B3636185011"
2023-06-19T16:11:29.135Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:396 waiting for background workers to stop on "/srv/victoriametrics/data/indexdb/17393B3636185010"...
2023-06-19T16:11:29.135Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:399 background workers stopped in 0.000 seconds on "/srv/victoriametrics/data/indexdb/17393B3636185010"
2023-06-19T16:11:29.135Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:401 flushing inmemory parts to files on "/srv/victoriametrics/data/indexdb/17393B3636185010"...
2023-06-19T16:11:29.136Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:404 inmemory parts have been successfully flushed to files in 0.000 seconds at "/srv/victoriametrics/data/indexdb/17393B3636185010"
2023-06-19T16:11:29.136Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:406 waiting for flush callback worker to stop on "/srv/victoriametrics/data/indexdb/17393B3636185010"...
2023-06-19T16:11:29.136Z info /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/lib/mergeset/table.go:409 flush callback worker stopped in 0.000 seconds on "/srv/victoriametrics/data/indexdb/17393B3636185010"
2023-06-19T16:11:29.137Z fatal /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.89.1/app/vmstorage/main.go:113 cannot open a storage at /srv/victoriametrics/data with -retentionPeriod=14d: cannot open table at "/srv/victoriametrics/data/data": cannot open partitions in the table "/srv/victoriametrics/data/data": cannot open partition "2023_05": cannot open big parts from "/srv/victoriametrics/data/data/big/2023_05": cannot create directories for partition "/srv/victoriametrics/data/data/big/2023_05": cannot create tmp directory "/srv/victoriametrics/data/data/big/2023_05/tmp": mkdir /srv/victoriametrics/data/data/big/2023_05/tmp: structure needs cleaning
The last line of the log is what jumps out at me.
The error "cannot open table at “/srv/victoriametrics/data/data” (and subsequent ones) seem to hint at some sort of data corruption.
The first thing I’d do is sneak a peek at that directory to see if the structure exists:
docker exec -it <pmm-server-name> bash
ls -al /srv/victoriametrics/data/data
(take note of owner/group…should be pmm and pmm)
Assuming the big/small/lock files are present I’d start with looking to see what’s in ./big/2023_05 there should be a tmp
and txn
folder in there. Again, get the permissions for each directory as I’m suspect that something changed either ownership or permissions wise (all dirs should be 755 and files 644 with pmm as owner and group).
It may be possible to manually create directorys missing and restart pmm but need to understand why first.