Cacti SSH DISKFREE returning -1 randomly on just one volume

Hello

We are using the template and script for getting Linux box stats via SSH. All is working well except randomly it reports back -1 for the root partition’s DISKFREE_available and DISKFREE_used. Strangely it only does it for one of several volumes checked for this box, all checked at the same time. We have thold running on this so it annoyingly triggers an email about low free space.

02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nj:-1' [map nj->DISKFREE_used]
02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nk:-1' [map nk->DISKFREE_available]
02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nj:216727552' [map nj->DISKFREE_used]
02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nk:786444288' [map nk->DISKFREE_available]
02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nj:67098263552' [map nj->DISKFREE_used]
02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nk:33221103616' [map nk->DISKFREE_available]
02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nj:1564028928' [map nj->DISKFREE_used]
02/13/2014 12:01:28 PM - POLLER: Poller[0] Parsed MULTI output field 'nk:339497058304' [map nk->DISKFREE_available]
02/13/2014 12:01:28 PM - POLLER: Poller[0] CACTI2RRD: diskfree_used_336.rrd --template DISKFREE_used:DISKFREE_available 1392292864:-1:-1
02/13/2014 12:01:28 PM - POLLER: Poller[0] CACTI2RRD: diskfree_used_337.rrd --template DISKFREE_used:DISKFREE_available 1392292864:216727552:786444288
02/13/2014 12:01:28 PM - POLLER: Poller[0] CACTI2RRD: diskfree_used_341.rrd --template DISKFREE_used:DISKFREE_available 1392292864:67098263552:33221103616
02/13/2014 12:01:28 PM - POLLER: Poller[0] CACTI2RRD: diskfree_used_342.rrd --template DISKFREE_used:DISKFREE_available 1392292864:1564028928:339497058304

Can I get it to save the df out put to a log file maybe, see what’s happening better?

Thanks for your help

Ok, I’ve found the $debug_log option in the ss_get_by_ssh.php script. I shall see what it records when it happens next

Right, it has happened again now I have the debug log going.

So it is timing out connecting to the server and thus logging it as “-1”.

Is there a better way to log this or a way any one knows of to stop thold triggering?

array (
0 => 'result of ssh -q -o "ConnectTimeout 10" -o "StrictHostKeyChecking no" user@host -p 22 -i id_rsa \'df -k -P\'',
1 => 'timed out',
)

I’ll try increasing the time out to see if it reduces the occurrences but I’d prefer a better solution to stop ‘false’ alarms emails. I understand this may be better aimed at the thold people so I’ll go find where I can ask them too.

Thank you!

You can increase the timeout
$cmd_tout = 10; # Command exec timeout (ssh itself or local cmd)

in ss_get_by_ssh.php.cnf.

I have done.

I’m wondering though, is the -1 the correct way to log ‘no value’ to Cacti? Should it not be null or something?

There is no null for Cacti. 0 can mean a value of 0, but -1 can mean that something is wrong and worse to check…