Shutting off Nagios Alarms

When a nagios problem can't be fixed right away it can get tiresome to have the alarm message coming out every hour or so. Almost the same configuration is used on all for servers (server-1..2{t,p}) so cut/paste mistakes can cause the error message to be misleading. The messages are supposed to give the host name (server-1 or server-2) followed by the cluster name "DC" or "TEST". If something seems odd, then look at the mail headers on the alarm message to see if the originating host can be determined independently of the message subject line.

Acknowledging an Alarm

To silence a particular instance of an alarm do nagios-ack-service-problem host problem aComment.

The host generating the alarm.
The nagio name for the alarm. This usually appears in the subject of the alarm email after a slash (e.g.,
it's "HTTP" given the subject line ** PROBLEM Service Alert: server-2/HTTP (DC) is WARNING **).
Is a reason that you're acknowledging the error.

Example: nagios-ack-service-problem server-2 HTTP "fix later?"

Acknowledging the alarm will generate one more email stating that the alarm was acknowledged and with any luck no more alarm emails. If an acknowledgement email is not seen then there was some problem acknowledging the problem and alarms will continue to be generated; there is no feedback for bad acknowledgement such as typos, etc.

Disabling an alarm

In some cases it might be desirable to stop the problem for the long term. To do that the configuration files will need to be editted. The files are located in /usr/local/nagios/etc/objects. Grep for a target of interest and then edit the appropriate file and delete or comment out as appropriate. After editing the files, restart nagios by doing systemctl restart nagios which will cause the nagios daemon to start using the new configuration and will generate an immediate error if the file syntax is violated.

-- JimJacobs - 2020-09-30
Topic revision: r1 - 2020-09-30, JimJacobs
