Install Monitoring Systems

We install ganglia in /opt/services instead of the normal location. This seperates it from the OS install and allows us to re-install and/or upgrade the OS without worry of interfearing with installed services like ganglia. We also keep /opt/services on a seperate disk which allows us to replace the entire OS disk without interfearing with installed services. Finally, we try to seperate the local configurations of a service (ganglia-local) from the service itsetlf (ganglia). This allow us to easily upgrade the service without having to reconfigure it to work the way the previous version did.


Ganglia


Compiling Ganglia

Make an area for ganglia to live
mkdir -p /opt/services/ganglia-3.2.0

$ Ganglia requires libconfuse. Download libconfuse from http://www.nongnu.org/confuse/:
cd /tmp
tar xfvz /home/src/ganglia/src/confuse-2.7.tar.gz
cd confuse-2.7
configure --prefix=/opt/services/ganglia-3.2.0 --enable-shared
make
make install  

$ Ganglia also requires rrdtool for head nodes. Download rrdtool from http://www.rrdtool.org/:
cd /tmp
tar xfvz /home/src/ganglia/src/rrdtool-1.4.6.tar.gz
cd rrdtool-1.4.6
configure --prefix=/opt/services/ganglia-3.2.0 --enable-shared
make
make install  

$ Download ganglia from http://ganglia.sourceforge.net/:
cd /tmp
tar xfvz /home/src/ganglia/src/ganglia-3.2.0.tar.gz 
cd ganglia-3.2.0
LDFLAGS="-L/opt/services/ganglia-3.2.0/lib" configure --prefix=/opt/services/ganglia-3.2.0 --with-libconfuse=/opt/services/ganglia-3.2.0 --with-gmetad
make
make install

(cd /opt/services ; ln -s ganglia-3.2.0 ganglia)

mkdir -p /opt/services/ganglia-local/bin
mkdir -p /opt/services/ganglia-local/etc/conf.d
mkdir -p /opt/services/ganglia-local/init.d
mkdir -p /opt/services/ganglia-local/lib64/ganglia/python_modules
If 32bit make lib instead of lib64

Create client configure script
cp gmond/gmond.conf /opt/services/ganglia-local/etc
cp /opt/services/ganglia/etc/conf.d/modpython.conf /opt/services/ganglia-local/etc/conf.d
edit /opt/services/ganglia-local/etc/gmond.conf and at least set the name in the cluster block. You may also want to change the ports used in the three channel blocks.

Create client startup script
cp gmond/gmond.init /opt/services/ganglia-local/init.d/nrao-gmond
edit /opt/services/ganglia-local/init.d/nrao-gmond

Create server configure script
cp gmetad/gmetad.conf /opt/services/ganglia-local/etc
edit /opt/services/ganglia-local/etc/gmetad.conf and at least set the data_source to the name you set in gmond.conf
e.g. data_source "Cluster" node1.example.edu:8649
edit /opt/services/ganglia-local/etc/conf.d/modpython.conf and change params and include to reference ganglia-local.
e.g. params = "/opt/services/ganglia-local/lib64/ganglia/python_modules"
e.g. include('/opt/services/ganglia-local/etc/conf.d/*.pyconf')

Create server startup script
cp gmetad/gmetad.init /opt/services/ganglia-local/init.d/nrao-gmetad
edit /opt/services/ganglia-local/init.d/nrao-gmetad


Make tarball to install on clients

cd /opt/services
tar cfvz ganglia_nrao_`uname -i`-3.2.0.tgz ganglia*
copy ganglia_nrao_`uname -i`-3.2.0.tgz /home/src/ganglia


Install Ganglia Client

cd /opt/services ; tar xfvz /home/src/ganglia/ganglia_nrao_`arch -i`-3.2.0.tgz
ln -s /opt/services/ganglia-local/init.d/nrao-gmond /etc/init.d
chkconfig --add nrao-gmond
/etc/init.d/nrao-gmond start


Install Infiniband (optional)

Download the InfiniB and network performance script from http://ganglia.info/gmetric/
Save it as /opt/services/ganglia-local/bin/infin.py and create a startup script to run it.

Edit /opt/services/ganglia-local/bin/infin.py
GMETRIC = '/opt/services/ganglia/bin/gmetric'
GMOND_CONF="/opt/services/ganglia-local/etc/gmond.conf"
Because we install ganglia in a non-standard location we had to edit infin.py to include the GMOND_CONF. I will attach our version to this page.
ln -s /opt/services/ganglia-local/init.d/nrao-infin /etc/init.d
chkconfig --add nrao-infin
/etc/init.d/nrao-infin start


Install Disk Metrics (optional)

Dwonload diskstats.py from https://github.com/ganglia/gmond_python_modules/pull/1/files
Save it as /opt/services/ganglia-local/lib64/ganglia/python_modules/diskstats.py
Download disk_gmetric.sh from http://ben.hartshorne.net/ganglia/
Save it as /opt/services/ganglia-local/bin/disk_gmetric.sh Then write a /etc/init.d/nrao-disk_gmetric which runs /opt/services/ganglia-local/bin/disk_gmetric.sh every 30 seconds
ln -s /opt/services/ganglia-local/init.d/nrao-disk_gmetric /etc/init.d
chkconfig --level 345 nrao-disk_gmetric on
/etc/init.d/nrao-disk_gmetric start


Installing Ganglia Server

Install the tarball made in the previous section
cd /opt/services
tar xfvz /home/src/ganglia/ganglia_nrao_`uname -i`-3.2.0.tgz

edit gmetad.conf and set the following
rrd_rootdir "/opt/services/ganglia-local/var/rrds"
Then make that directory
mkdir -p /opt/services/ganglia-local/var/rrds
chown nobody /opt/services/ganglia-local/var/rrds

Install the apache web server (which is a taks left up to the reader) and configure a virutal host for ganglia. Then
mkdir /opt/services/ganglia/www
cp -R ganglia-3.2.0/web/* to /opt/services/ganglia/www
edit /opt/services/ganglia/www/conf.php and modify the following
$gmetad_root = "/opt/services/ganglia-local/var";
define("RRDTOOL", "/opt/services/ganglia/bin/rrdtool");
$time_ranges = array(
   'halfhour'=>1800,
   'hour'=>3600,
   '2hour'=>7200,
   '4hour'=>14400,
   '8hour'=>28800,
   'day'=>86400,
   'week'=>604800,
   'month'=>2419200,
   'year'=>31449600
);

Finally
mkdir -p /opt/services/ganglia-local/var/dwoo/
chown apache /opt/services/ganglia-local/var/dwoo/


Nagios


Installing a Client

This is only necessary if you need to monitor something that can only be done locally to the client (like 3ware card or disk usage):

echo "nagios:x:1103:1103:nagios:/var/log/nagios:/bin/sh" >> /etc/passwd
echo 'nagios:!!:15280::::::' >> /etc/shadow
echo "nagios:x:1103:" >> /etc/group
echo "nagios ALL = NOPASSWD: /opt/services/nagios-local/plugins/check_3ware.sh" >> /etc/sudoers
sed -i -e 's/^Defaults.*requiretty/#Defaults    requiretty/' /etc/sudoers

cd /opt/services ; tar xfvz /home/src/nagios/client/nagios-1.4.15-x86_64-nrao.tgz

edit /opt/services/nagios-local/plugins/check_3ware.sh and set TWCLI to the full path of the tw_cli program e.g. /opt/services/3ware/CLI/tw_cli

ln -s /opt/services/nrpe-local/init.d/nrao-nrpe /etc/init.d
chkconfig --add nrao-nrpe on
/etc/init.d/nrao-nrpe start


Nagios Front End(website) Administration

$ http://nagios.aoc.nrao.edu/: Login: admin
password: the admin passwd


Acknowledge Nagios Alerts

  • Click on Tactical Overview on the Left Side menu. Any alerts/issue will show up as red boxes.
  • Click on the red box and you can see the detail of the alert. If problem is a service (ie http) then the service will be highlighted.
  • Click on the problem, you will have a list of options on the left
  • Click on the icon of the man shoveling Acknowledge Problem.
  • Fill in the dialog box. This will prevent any further messages being sent about this problem
  • Once the problem as been cleared the system or service will automatically go back to its normal state.


Administrating our Nagios Server

Nagios lives on the server hugin in /opt/services/nagios. All configurations specific to aoc/nrao are contained in /opt/services/nagios/etc/nrao. Before any kind of monitoring, including services, can be done on a system you first must define the host.


Adding a host definition

edit the file /opt/services/nagios/etc/nrao/nraohosts.cfg and add a new entry like this

define host{
        use             linux-server
        host_name       hugin
        alias           nagios
        address         10.64.1.32
        }

If you want to specify a group for a server, add it to the appropriate hostgroup definition found near the bottom of the nraohost.cfg file.


Adding monitoring for new service via nagios

If the service, like http, is already being monitored on existing servers and you just need to monitor it on a new server then add the name to the existing definition in the nraoservices.cfg file
define service{
        use                     generic-service ;For monitor http services
        host_name               hugin, vivaldi, penn, gila, acorn, magnolia, occam, smrti, whatever
        service_description     http
        check_command           check_http
        }

If this is a new service, you may first need to define the nagios command that you will be using. A list of prebuilt commands can be found in /opt/services/nagios/libexec

To define a new command, edit the file nraocommands.cfg and add something like this:
define command{
        command_name    check_cups
        command_line    $USER1$/check_http -H $HOSTADDRESS$ -p 631
}

In this example we are using the check_http plugin command to check the status of cups(port 631). Once the command is defined, then you can add an entry for it in the nraoservices.cfg.


Restarting/Reloading nagios definitions

Once additions are made, nagios configs need to be reloaded.
/etc/init.d/nrao-nagios reload
Nagios will check for configuration errors and will not reload if problems exists.


Resources

Additional information about nagios can be found at http://www.nagios.com/products/nagioscore


Topic revision: r6 - 2012-07-23, KScottRowe
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback