MongoDB latency metrics

Hello all,

I am using PMM2.9.0 for monitoring Mongo farm. Under MongoDB → MongoDB ReplSet Summary → MongoDB Services Summary, there is a “Latency detail” widget. Can somebody explain me how it is collecting latency? Because it is not fit with Document Operation time, it comes 5 mins back from operation time. I need to understand how it is calculating latency and need to improve it to get exact in time values.

Here is some metrics that are calculated:
mongodb_mongod_op_latencies_latency_total
mongodb_mongod_op_latencies_ops_total

Here is some queries:

topk(5,avg by (service_name) ((rate(mongodb_mongod_op_latencies_latency_total{service_name=~"a-mongodb",type="write"}[5m]) / 
(rate(mongodb_mongod_op_latencies_ops_total{service_name=~"a-mongodb",type="write"}[5m]) > 0)
)))

avg(rate(mongodb_mongod_op_latencies_latency_total{service_name=~"a-mongodb",type="write"}[5m]) / 
(rate(mongodb_mongod_op_latencies_ops_total{service_name=~"a-mongodb",type="write"}[5m]) > 0)
)

Hi Ghan,

It’s taken 5 min interval in the provided formula.
If it’s auto please set it to 1 sec for a more accurate result.
Auto option allows to specify how many times the current time range should be divided to calculate the current auto time span.
Screenshot_20210128_142441

1 Like

Thanks @adivinho , when I change interval from 5m to 1s, I could not get any result?

1 Like

Hi @Ghan
Can you show us the same view with valid data using [5m]? Can you also try with [5s], [1m]?

Could it be you don’t have any data in PMM for this period?

1 Like

Hi @Michael_Coburn ,

Here you are , this is 5s. Only one data we have which is impossible since it is PROD envorinment.

This is 1m

I think minimum time value is 5s
image

I dont know how @adivinho got 1s results?

1 Like

Ok thank you for sharing - this shows that there is a valid metric series. It seems possible that you are only collecting at the 5s interval, and not more aggressively at the 1s scraping interval. You are looking at the most precise method of calculating the Latency detail widget, as the formula will use many more data points to compute the average value.
Does this discussion answer your question, or do you have further questions for us?

1 Like

Thanks @Michael_Coburn I need to follow latency as like operation. But right now it does not fit together as timeframe

image
image
image

Above 3 SS, all of them have same time frame. But if you check Latency, it is not matching with activity. For example look at the slit between 12:00 - 16:00. It means there is no operation or near 0 operation happened. But at same time for Latency window, it has spike instead of slit.

I need to see latency as like operation time fire, like online. But something is wrong or I am missing. Can you help me to figure it out?

1 Like

@Ghan thanks for clarifying your situation.
What I see on these graphs is that you had a spike in latency which generally means less (or no) work is getting done. So that makes sense when latency spikes, throughput of operations concluded drops.
if you can zoom in the time period and show the same three graphs we can investigate this further to get the timing correct - what I expect to see is a latency spike by a second or two leading the drop in throughput.

1 Like

Hello @Michael_Coburn

Here you are, this is 30mins timeframe.
image

image

image
And one more point, if I changed the timeframe from 30mins to 60mins, I got some “dots” at latency?
image

Also, this is another replica set. At this, there is no write operations but still it shows write latency ? Please look at the times spikes occured?

image

image

image

Thank you for your help already :+1:

1 Like

Hi @Ghan , Thank you for sharing your images! Based on the first set of images covering the 14:40 - 14:50 period, it looks to me that either the cluster wasn’t performing any further work, or, the mongodb_exporter wasn’t able to access mongod and thus couldn’t fetch metrics. I suspect it was the former however seeing read and command continue uninterrupted but writes stopped for 10 minutes seems odd. Did the server experience an OS or hardware fault that would have blocked write operations but still permitted reads to continue? Note: I am not a MongoDB expert so I am not sure if this is possible or not even to occur.
The “dots” usually happen when there are a series of missed samples, either mongodb didn’t respond (was saturated) or some other failure in metrics collection. Up until we enabled PUSH in PMM Server in 2.14, if you missed scraping during a period for whatever reason, you wouldn’t be able to backfill, and thus would have gaps in your graph.
Regarding the second set of charts from 16:00-20:00 what I suspect is that you were able to write to an in-memory space until that got filled, then it spilled to disk, which is possibly why the latency spiked and then started to recover.
If you suspect that the graphs are reporting erroneous data, what you can also do is sample every second the same metric series that mongodb_exporter is doing and write those to a file, then draw a graph to validate if your sampled data correlates with that in PMM.

1 Like

Thanks @Michael_Coburn , I am suspicious about mongodb_exporter. Since I can confirm there was no error at infrastructure detail and all my MongoDBs were journal:enabled(which means not in-memory use), there is only one thing left. I tried to upgrade PMM but there is an existing bug at mongodb_exporter so I could not and still waiting to fix it (Problems after upgrading from 2.9.0 to 2.11.1 - #13 by Ghan)

At last 3 images, it seems exporter or PMM follow metrics 1hour back? Is not it strange? Also there were no inserts but still got latency spikes?

2 Likes

Understood @Ghan - as I mentioned before I am not a MongoDB expert therefore I can’t explain why latency would be relatively zero through active records being written, then spike significantly during a period of zero inserts.
I know you are waiting for a fix related to mongodb_exporter in PMM-7116 - have you tested deploying our latest in PMM 2.14 ? We made significant changes related to swapping out Prometheus for VictoriaMetrics.

1 Like

Allright @Michael_Coburn , is there any MongoDB specialist that you can suggest to check my problem out?

I tried PMM 2.14 but unless solving my bug about mongodb_exporter, I can not use it because it is not working with PMM Agent-2.9.0 fluently :frowning:

2 Likes

Hi @Ghan - I’ve asked in Percona Slack to see if a MongoDB expert could take a look at this ticket.

As you are probably aware, the Percona Forums are provided as free support. If you are a Percona Customer, please let me know as we offer an SLA for resolution on cases such as these concerning Percona Software, and I’ll help you open a ticket in order to get this resolved ASAP. If you’re not yet a Percona Customer and are interested in becoming one, please also let me know and I would be happy to help you!

1 Like

Hi @Ghan,
regarding these charts, could you check if there is really 0 operations, not even 1?
Our QAN mongodb profiler collects data from mongodb servers and it can generate non-zero metrics in your dashboards.

Could you please share more details about your concerns PMM 2.14 not working with PMM Client 2.9.0? What didn’t work there?

Thank you

2 Likes

Hi there!
@Michael_Coburn, I know MongoDB is not Percona’s product and of course you have paid support options but I thought it is good all forum users to know that you have experts based on MongoDB, PostgreSQL etc. and they are helping through Percona Forum. I just need to understand if this problem related with MongoDB itself or something else. Thank you for your kind help already :smiling_face_with_three_hearts:

@nurlan I am not a DBA but I tried to figure it out, yesterday I spoke with my customer about MongoDB usage and they told me " we are using upsert instead of insert command". After little research at MongoDB upsert commands reflect as “update” commands. So this is explaining my problem. At tests I did not see any insert because upsert is using and it shown as update at PMM. Today I will open mongodb profiler for all queries and I hope, I can catch something.

About PMM 2.14, after upgrading from PMM 2.9.0 to 2.14, all my mongodb metrics gave error. I thought it is because of version of pmm2-client. Then i update one of pmm2-client from 2.9.0 to 2.14 and still I got problems with it. Then it becomes as a bug :frowning: By the way, 2.9.0 node_exporter features works with PMM 2.14 but mongodb not. Thanks :pray:t3:

3 Likes

Thank you for detailed explanation. We will check it and fix.

2 Likes