Pmm upgrade from 2.16 to 2.21 query analytics not starting QAN doesn't display any data

The current PMM client is a Virtual Appliance on vSphere running 2.21.0 of PMM.

[root@localhost access]# /usr/sbin/percona-qan-api2 --data-retention=30 --debug --trace
stdlog: qan-api2 v2.21.0.
INFO[2021-09-20T14:01:17.038+00:00]qan-api2/main.go:284 main.main Log level: trace.
stdlog: dsn: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2
stdlog: Migrations: Dirty database version 9. Fix and force version.
[root@localhost access]#

[root@localhost access]# clickhouse-client --query “select * from pmm.schema_migrations”
1 1 1600776273056694360
1 0 1600776273075425791
2 1 1600776273076725910
2 0 1600776273116303743
3 1 1600776273117564041
3 0 1600776273156214628
4 1 1600776273157073358
4 0 1600776273181985508
5 1 1600776273182842458
5 0 1600776273193594158
6 1 1600776273194365739
6 0 1600776273203387813
7 1 1620132085016538511
7 0 1620132085071669360
8 1 1620132085072878986
8 0 1620132085173742781
9 1 1631874203339757424
[root@localhost access]#

thre result of the supervisorctl command is :

[root@localhost ~]# supervisorctl status
alertmanager RUNNING pid 1105, uptime 1:14:25
clickhouse RUNNING pid 1099, uptime 1:14:25
cron RUNNING pid 1102, uptime 1:14:25
dashboard-upgrade EXITED Sep 20 11:44 AM
dbaas-controller STOPPED Not started
grafana RUNNING pid 1100, uptime 1:14:25
nginx RUNNING pid 1101, uptime 1:14:25
pmm-agent RUNNING pid 1109, uptime 1:14:25
pmm-managed RUNNING pid 1174, uptime 1:14:19
pmm-update-perform STOPPED Not started
postgresql RUNNING pid 1098, uptime 1:14:25
prometheus STOPPED Not started
qan-api2 STARTING
victoriametrics RUNNING pid 1103, uptime 1:14:25
vmalert RUNNING pid 1104, uptime 1:14:25
[root@localhost ~]#

Hi @Davud_Can , welcome to the Percona forums!
Could you please show us the logs from the file /srv/logs/qan-api2.log so that we can see where the application is failing to start?

1 Like

Hi @Michael_Coburn

Thank you for your quick return.
I have attached the log file you requested.

Regardsqan-api2.log (65.9 KB)

1 Like

Hi @Davud_Can , thanks for sharing the logs. As you will notice, there is a recurring failure every 2 seconds

e[36mINFOe[0m[2021-09-20T15:46:10.212+00:00] Log level: info.                             
stdlog: dsn:  clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2
stdlog: Migrations: Dirty database version 9. Fix and force version.
stdlog: qan-api2 v2.21.0.

I haven’t seen this error before, and I am escalating it to the PMM Engineering team. It seems there is an issue with the ClickHouse schema.

2 Likes

Thanks a lot for the information.
I will wait for news from you.

1 Like

Hi @Davud_Can,

Could you, please connect to clichouse using clickhouse-client installed in our docker image and share results of this command.

  • describe table pmm.metrics;

Thank you

1 Like

Hi @nurlan

I am attaching the output you requested as an attachment.

Regards,desc_pmm_metrics.txt (12.4 KB)

1 Like

Hello @nurlan

Is there any news on the subject?

Regards,

1 Like

Hi @Davud_Can
We couldn’t reproduce this behavior, but we have workaround for you.
could you do these manual steps in your clickhouse?

  1. log in into clickhouse.
  2. go to pmm db using use pmm;
  3. execute
ALTER TABLE metrics ADD COLUMN `m_plans_calls_cnt` Float32, ADD COLUMN `m_plans_calls_sum` Float32 COMMENT 'Total number of planned calls', ADD COLUMN `m_wal_records_cnt` Float32, ADD COLUMN `m_wal_records_sum` Float32 COMMENT 'Total number of WAL (Write-ahead logging) records', ADD COLUMN `m_wal_fpi_cnt` Float32, ADD COLUMN `m_wal_fpi_sum` Float32 COMMENT 'Total number of FPI (full page images) in WAL (Write-ahead logging) records';
  1. execute insert into schema_migrations (version, dirty, sequence) values(9, 0, toUnixTimestamp(NOW())*1000000000);

Thank you

2 Likes

Hi @nurlan

Thank you for your precious help, these solution has worked for us and solved our problem.

Regards,

2 Likes