Mongo Sharded Cluster monitoring (sanity check) and other noob q's...

Hi
I’ve got PMM successfully installed across several ubuntu nodes and communicating to the docker-based pmm-server.

I’ve been scouring doc and haven’t been able to find a how-to or blog post about monitoring a mongo sharded-cluster. There seems to be a lot of information taken-for-granted that’s not spelled out for newcomers. So, after a couple days of poking around blowing stuffs up, I’m basically looking for confirmation that things here are correctly set-up while I do this product evaluation.

In my dev env, I have a three machines each running a config server, and a repl-set node for one of the two shards. My primary work laptop is running the mariaDB master and the mongo-router for the sharded-cluster. I have a pmm-client running on the work laptop and I’ve used pmm-admin to connect to the mongos (router) – I have an accurate report under “cluster summary” in the grafana dash. I was not able to start the mongo-router with the operationProfiling section as documented, but I did add it to the shard-1.1 node and was able to start mongod with those options.

I am not able to add the local mongod instance (for the shard) using pmm-admin add however. So, my first question - is this the correct way to monitor a sharded repl-set? That you attach the client to the router instance and that’s it as long as you enable the operationProfiling section on all mongod instances in the sharded repl-set?

Code:
root@gordito:~# pmm-admin add --uri mongodb://localhost:27018 mongodb [linux:metrics] OK, already monitoring this system. [mongodb:metrics] Cannot connect to MongoDB using uri mongodb://localhost:27018: no reachable servers root@gordito:~# mongo --port 27018 MongoDB shell version: 3.2.19 connecting to: 127.0.0.1:27018/test Server has startup warnings: 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is ‘always’. 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** We suggest setting it to ‘never’ 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is ‘always’. 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** We suggest setting it to ‘never’ 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] namasteShard1:PRIMARY>
Second question is: how do you monitor the config-servers? As they’re just another repl-set serving hash-keys, don’t you also want these under PMM?

Third question: Why does the repl-set option in the grafana dash show blank, as in it’s not connecting or detecting the shard’s replication set?

If, in the grafana dash, I switch away from the mongo overview (which is linked (according to the instance drop down) to the laptop, and then click to it using the drop-down menu option, I lose all data until I go back to the main window and click the system node (left column) to regenerate the page. The laptop, when I request the overview page, is missing from the instance drop down in the top left.

I think all of these questions can be answered in the appropriate how-to on setting up monitoring for a repl-set or sharded-cluster – if such a doc exists, can someone please provide a link?

On the primary shard node, I cannot add the shard

Next, on the mysql side, when I look at the dash for mysql replication, I don’t see anything other than graphs in the left side. No top-bar summary info or right-side data. I am assuming one also needs to add the repl-set using the pmm-admin tool? (I didn’t find this explicitly stated so am just guessing…)

Finally, on the main dashboard, I have a correct count of the number of systems monitored, but the db count (Monitored DB Instances) is at 1 even though I can get graphs (as described above) from the mysql and mongo connections… why is this?

Ok - that’s it for now, I am going to keep trying various permutations to see if I can get things to mesh on my own. However, I would be deeply grateful if someone could point out my more glaring mistakes, assumptions and errors… :wink:

Thanks!

—mike

Hi mshallop , I’m not sure myself but I’m escalating this to see if we can find you the correct steps.

Hi Mike

In order to monitor a cluster, you will need to monitor each instance. Each instance:port is considered a process to be monitored in pmm.

Please add the shard members using --uri as mentioned here:
[url]Percona Monitoring and Management

Make sure you’re adding an instance with it’s replica set name and also with the --cluster name an example of --uri is:

This is an example how to add an instance:

In order to add the mongos please use

New versions of mongodb may only listen to a specific IP if you’ve changed it. Please use the instance IP or name to add those instances. For tests usually 127.0.0.1 works.

Config servers are the same as standard replicasets so the following command will work fine:

For the mariaDB one, please proceed and add the user to be monitored too as you did with the master.

Please let us know the outcome.

Regards,

Thank you for the replies!

I opened a ticket with sales requesting consulting services, but the sales rep blew me off and I’m stuck in limbo… so, I moved on to another “pants on fire” project…

I’ll circle back to this probably around the end of the month and will post an update…

Thank you, again, for the replies!

–mike

Hi mshallop

I replied to you over email - let me know how Percona can be of assistance, thanks!!

Hey mshallop, I’m trying to install on a sharded replica set and am trying to figure out what pmm-admin commands need to run on each system (router, replica masters, config servers, etc). What steps did you follow to set all your clients up?