Why are the queryid different?

Hello,

I am using pg_stat_statements and pg_stat_activity and pg_stat_monitor.
I am curious why the query id is different in the three views.

Hello,
Can you provide some examples?
If you are checking the same postgresql cluster for all the queries, then the query id should be the same as long as they are using the same execution plans. If plans are different it may mean that your data statistics are not up to date, and postgres is sometimes using non optimal execution plans, resulting in different query id, and after statistics are updated plan changes back to the good one.

This is a simple example.

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),
    join_date DATE
);
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE,
    amount DECIMAL(10, 2)
);
CREATE TABLE order_items (
    item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_name VARCHAR(100),
    quantity INT,
    price DECIMAL(10, 2)
);

INSERT INTO customers (customer_name, join_date) 
SELECT md5(random()::text), date '2021-01-01' + (random() * 1000)::int
FROM generate_series(1, 10000);
-- orders 
INSERT INTO orders (customer_id, order_date, amount)
SELECT customer_id, date '2021-01-01' + (random() * 1000)::int, random() * 1000
FROM customers
ORDER BY random()
LIMIT 100000;
-- order_items 
INSERT INTO order_items (order_id, product_name, quantity, price)
SELECT order_id, md5(random()::text), (random() * 10)::int, random() * 100
FROM orders
ORDER BY random()
LIMIT 1000000;
SELECT
    c.customer_id,
    c.customer_name,
    o.order_id,
    o.order_date,
    (SELECT SUM(oi.price * oi.quantity) FROM order_items oi WHERE oi.order_id = o.order_id) AS order_total,
    (SELECT AVG(amount) FROM orders o2 WHERE o2.customer_id = c.customer_id) AS avg_order_amount
FROM
    customers c
    JOIN orders o ON c.customer_id = o.customer_id
WHERE
    o.amount > 50
ORDER BY
    c.customer_id, o.order_date;

The above query seems to have a long execution time. While executing the above query, if you compare the query ID of pg_stat_activity view and the query ID of pg_stat_monitor collected after completing the query, you can see that they are different. I would like to know why this is different.

I tried to reproduce this with your provided script.
In my example both query ids are exactly the same: -7907767505760499297
Please have a look at attached screen shots.


Are you sure you are comparing query_id to queryid, and not pgsm_query_id?

Fortunately, I wasn’t confused.

What does the above result look like after the query is finished?

I tested it like this.

  1. Run a query and check pg_stat_activity.
  2. Force terminate a long-running query and check pg_stat_monitor.

At this time, I confirmed that the query id of pg_stat_activity and the query id of pg_stat_monitor are different.

I have a question here. I am not a DBA, so this may be a confusing problem.

  1. Is pg_stat_monitor supposed to collect after a query ends?
  2. Since there is no pid, there is no way to check if it is the same query. In this case, is it right to rely only on the query id?

It is a different story then, if you terminate the query it may get different query_id, but you will see additional values in pg_stat_monitor, in columns: elevel, sqlcode and message, that the query was not finished properly. See below:

You can query all queries that are using the same relations, are of the same type (SELECT, UPDATE, etc.) and maybe have some part of query is also the same, using like ‘%o.amount >50%’, or even use full query if you need to check if those are exact matches. Query_ids are different, because one is for query that finished properly, and other that were interrupted for some reason. Their execution was different, so they have different query_id.

Thank you. I understood it well.