We’re having problems with a PMM agent not sending metrics to our PMM server. When i go to /prometheus/targets I see the error message:
Get "http://%E2%80%93force:42000/metrics?collect%5B%5D=hwmon&collect%5B%5D=textfile.mr": dial tcp: lookup xn--force-xu3b on {IP}: no such host
I’m guessing this is going to all come back to security settings of one kind or another, but how is it getting “xn–force-xu3b” as the hostname? I’m assuming that is what’s causing our issue. Is there anything else about this error message I could be missing? (I’m assuming we have port 42000 unblocked as it has worked already on another of our db servers)
can you take a look in the bash history for how monitoring was added for the given node? the “–force” looks like it may have been unintentionally added in the ‘pmm-admin add’ command (is the hostname xn-xu3b?). If, when adding a system to monitoring, you do no specify a serviceName the hostname when registering with the pmm-server. The agent doesn’t technically send the metrics to the server though, the agent tells the server “where it’s at” and the server reaches out on port 42000 for os metrics and then 42001,2,3,etc for each additional exporter added (imagine having mysql, pg and mariadb all running on the same host) in a pull model. Also at the time of adding the host if you don’t specify an IP address for the host the agent will “guess” which one it should use (typically just eth0) but if you have a multi-homed system (management interface, data interface, storage interface) we could have guessed wrong (but I’d expect an IP address as opposed to a lookup of an invalid hostname. My guess is that you’ll see the pmm-admin command look something like: pmm-admin add mysql --username=pmm --password=password --host=xn–force-xu3b or something along those lines.
Unfortunately, I didn’t run the commands, I had to ship them off to another team who owns the servers and I have to trust that they were run correctly (I’m a monitoring specialist not a DBA, so no permissions for me). I think you’re correct, that makes total sense where that --force came from, I’m assuming the command was run improperly. Here’s a screenshot of what the agent looks like, the “node_name” label is actually correct, but the URL that was generated for the PMM server to collect metrics from doesn’t make sense. I’ll try to see if I can get them to run the remove commands and “start over”.
Can you see if you can get the original command from the other group? strip out the confidential stuff but I’m wondering if it was as simple as a --force being appended to a command that took it as a positional argument without validating that --force shouldn’t have been allowed. I can file a bug and maybe save someone else some headache! Glad you found though and hopefully all’s right with the world again…except that whole virus thing ;-)
the cause was a command that was auto-formatted by Outlook and entered by the admin. – auto-formatted to a long dash. So –force ended up as the hostname.
Ahhhh…when I ran the code through an HTML encoder it only came as 1 dash and I didn’t even bat an eye as to why. Just the same I’ll run it through the product team to see if even a - should be allowed as a hostname (I think not but also don’t now the RFC on hostname legality!)