Hi
I’ve got PMM successfully installed across several ubuntu nodes and communicating to the docker-based pmm-server.
I’ve been scouring doc and haven’t been able to find a how-to or blog post about monitoring a mongo sharded-cluster. There seems to be a lot of information taken-for-granted that’s not spelled out for newcomers. So, after a couple days of poking around blowing stuffs up, I’m basically looking for confirmation that things here are correctly set-up while I do this product evaluation.
In my dev env, I have a three machines each running a config server, and a repl-set node for one of the two shards. My primary work laptop is running the mariaDB master and the mongo-router for the sharded-cluster. I have a pmm-client running on the work laptop and I’ve used pmm-admin to connect to the mongos (router) – I have an accurate report under “cluster summary” in the grafana dash. I was not able to start the mongo-router with the operationProfiling section as documented, but I did add it to the shard-1.1 node and was able to start mongod with those options.
I am not able to add the local mongod instance (for the shard) using pmm-admin add however. So, my first question - is this the correct way to monitor a sharded repl-set? That you attach the client to the router instance and that’s it as long as you enable the operationProfiling section on all mongod instances in the sharded repl-set?
Code:
root@gordito:~# pmm-admin add --uri mongodb://localhost:27018 mongodb [linux:metrics] OK, already monitoring this system. [mongodb:metrics] Cannot connect to MongoDB using uri mongodb://localhost:27018: no reachable servers root@gordito:~# mongo --port 27018 MongoDB shell version: 3.2.19 connecting to: 127.0.0.1:27018/test Server has startup warnings: 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is ‘always’. 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** We suggest setting it to ‘never’ 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is ‘always’. 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] ** We suggest setting it to ‘never’ 2018-04-02T15:39:30.280-0700 I CONTROL [initandlisten] namasteShard1:PRIMARY>
Second question is: how do you monitor the config-servers? As they’re just another repl-set serving hash-keys, don’t you also want these under PMM?
Third question: Why does the repl-set option in the grafana dash show blank, as in it’s not connecting or detecting the shard’s replication set?
If, in the grafana dash, I switch away from the mongo overview (which is linked (according to the instance drop down) to the laptop, and then click to it using the drop-down menu option, I lose all data until I go back to the main window and click the system node (left column) to regenerate the page. The laptop, when I request the overview page, is missing from the instance drop down in the top left.
I think all of these questions can be answered in the appropriate how-to on setting up monitoring for a repl-set or sharded-cluster – if such a doc exists, can someone please provide a link?
On the primary shard node, I cannot add the shard
Next, on the mysql side, when I look at the dash for mysql replication, I don’t see anything other than graphs in the left side. No top-bar summary info or right-side data. I am assuming one also needs to add the repl-set using the pmm-admin tool? (I didn’t find this explicitly stated so am just guessing…)
Finally, on the main dashboard, I have a correct count of the number of systems monitored, but the db count (Monitored DB Instances) is at 1 even though I can get graphs (as described above) from the mysql and mongo connections… why is this?
Ok - that’s it for now, I am going to keep trying various permutations to see if I can get things to mesh on my own. However, I would be deeply grateful if someone could point out my more glaring mistakes, assumptions and errors…
Thanks!
—mike