After we upgrade the PMM server from 2.44.0 to 3.2.0, after we verify the QAN dashboard, it’s not showing any data, and in the PMM health check dashboard, it’s showing the QAN API is in a down state.
NOTE: Upgrade went fine with the below message. PMM Server has been successfully setup on this system!
I see the below errors in the Clickhouse log file.
? @ 0x00007f40043050fa in ?
? @ 0x00007f40043894c4 in ?
(version 23.8.2.7 (official build))
2025.06.18 07:21:48.301352 [ 142649 ] {} ServerErrorHandler: Code: 516. DB::Exception: default: Authentication failed: password is incorrect, or there
is no user with such name.
If you have installed ClickHouse and forgot password you can reset it in the configuration file.
The password for default user is typically located at /etc/clickhouse-server/users.d/default-password.xml
and deleting this file will reset the password.
See also /etc/clickhouse-server/users.xml on the server where ClickHouse is installed.
. (AUTHENTICATION_FAILED), Stack trace (when copying this message, always include the lines below):
DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c604bf7 in /usr/bin/clickhouse
DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000713cbf1 in /usr/bin/clickhouse
DB::AccessControl::authenticate(DB::Credentials const&, Poco::Net::IPAddress const&) const @ 0x0000000010e3775f in /usr/bin/clickhouse
DB::Session::authenticate(DB::Credentials const&, Poco::Net::SocketAddress const&) @ 0x000000001207f7ed in /usr/bin/clickhouse
DB::TCPHandler::runImpl() @ 0x000000001310afee in /usr/bin/clickhouse
DB::TCPHandler::run() @ 0x000000001311e839 in /usr/bin/clickhouse
Poco::Net::TCPServerConnection::start() @ 0x0000000015b104d4 in /usr/bin/clickhouse
Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b116d1 in /usr/bin/clickhouse
Poco::PooledThread::run() @ 0x0000000015c47f07 in /usr/bin/clickhouse
Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c461dc in /usr/bin/clickhouse
? @ 0x00007f40043050fa in ?
? @ 0x00007f40043894c4 in ?
(version 23.8.2.7 (official build))
2025.06.18 07:21:51.547830 [ 142649 ] {} Access(user directories): from: 127.0.0.1, user: default: Authentication failed: Code: 193. DB::Exception: Invalid credentials. (WRONG_PASSWORD), Stack trace (when copying this message, always include the lines below):
DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c604bf7 in /usr/bin/clickhouse
Even I have restored the 2.44.0 backup and tried to upgrade it to the PMM 3.2.0 version, but still the same issue…
I have tested one new temp server. I have installed an empty 2.44 server freshly, and I have upgraded the empty new 2.44 server to 3.2.0, and here it’s a success, and the ClickHouse API came up fine.
Not sure why an already running PMM server or old data is failing to start the QAN API after the PMM 3.2.0 upgrade.
Has something changed in the ClickHouse QAN API or ClickHouseDB plugins between 3.1.0 and 3.2.0?
@nurlan, I have resolved the issue by following the steps outlined below.
Note: Prior to attempting the PMM 3.2.0 upgrade, I had already taken a backup of PMM 2.44.0 from the production server.
Production Server:prdpmm101
Temporary Server:tmppmm101
After the PMM 3.2.0 upgrade on the production server failed, I decided to restore the PMM 2.44.0 backup to the temporary server, perform the upgrade there, and then migrate the working setup back to the production environment.
Steps taken to resolve the PMM 3.2.0 upgrade issue:
Restored the PMM 2.44.0 backup on the temporary server tmppmm101.
Upgraded the temporary server to PMM 3.2.0.
Verified the status of all services using the PMM Health Check dashboard.
This time, the Query Analytics (QAN) functionality came up without issues.
After confirming the upgrade was successful, I took a backup of PMM 3.2.0 from the temporary server.
Copied the PMM 3.2.0 backup from tmppmm101 to the production server prdpmm101.
Performed a fresh installation of PMM 3.2.0 on the production server.
Restored the PMM 3.2.0 backup to the production server.
Post-restore, I observed that the internal PostgreSQL monitoring was showing issues — specifically, the agent status was displayed as Down on the Service Summary dashboard.
To address the PostgreSQL role issue, it is essential to follow these steps:
After the fresh installation of PMM 3.2.0 but before restoring the backup, note down the usernames and passwords from the agents table for both postgres_exporter and qan-postgresql-pgstatements-agent.
Once the backup restoration is complete, update the PostgreSQL agent usernames and passwords in the production server using the values noted in step 11.
UPDATE agents SET username = ‘XXXXXXX’,password = ‘XXXXXXX’,updated_at = NOW() WHERE agent_id = ‘XXXXXXX’ AND agent_type = ‘postgres_exporter’;
UPDATE agents SET username = ‘XXXXXXX’,password = ‘XXXXXXX’,updated_at = NOW() WHERE agent_id = ‘XXXXXXX’ AND agent_type = ‘qan-postgresql-pgstatements-agent’;
Verify the PostgreSQL agent status in the PMM Service Summary dashboard.
If all services, including the PostgreSQL monitoring, show as Up, the backup restoration and upgrade process on the production server can be considered successful.