Host Status Tool

The host status tool can be used to summarize important attributes for all hosts in the system. It is invoked by doing host-status --all on one of the server boxes (preferably server-1). The command is actually a link in /opt/services/bin to python/ The non-servers are configured to mount server-1:/opt/services/bin so that all hosts can use the same set of admin scripts. However, occasionally the NFS mount is failing on the servers; this will cause the command to complain of "file not found" since the script running on server-1 invokes scripts on the other hosts and these will fail if the /opt/services/bin mount is not present.

There is only one version of the script and it runs on all boxes on both clusters. To get a formatted status of only a single host do host-status the-host-name. The script uses multitasking to run the status script on all hosts in parallel so it's fairly fast, although it can take a little longer if hosts are down and timeouts have to be incurred.

Sample Output

[15:52 root@S-1-p python]# host-status --all  <-- Only works using root.

Host Uptimes for Production Correlator

    Unpingable: None   <-- Hosts that don't answer pings
     Can't ssh: None   <-- Hosts that won't allow ssh
  4d, 20h, 45m: swc-001   <-- Hosts with uptimes in the same minute.
  4d, 21h, 36m: swc-059
  4d, 21h, 53m: server-2
  4d, 21h, 54m: swc-064
  4d, 21h, 55m: swc-063
  4d, 21h, 56m: swc-061..062   <-- Range notation used for sequential hosts.
  4d, 21h, 57m: swc-060
  4d, 21h, 58m: swc-058
  4d, 21h, 59m: swc-057
  4d, 22h,  0m: swc-056
  4d, 22h,  1m: swc-055
  4d, 22h,  3m: swc-053..054
  4d, 22h,  5m: swc-052
  4d, 22h, 10m: swc-051
  4d, 22h, 11m: swc-050
  4d, 22h, 13m: swc-049
  4d, 22h, 14m: swc-048
  4d, 22h, 16m: swc-047
  4d, 22h, 18m: swc-046
  4d, 22h, 20m: swc-045
  4d, 22h, 22m: swc-044
  4d, 22h, 23m: swc-043
  4d, 22h, 27m: swc-042
  4d, 22h, 28m: swc-041
  4d, 22h, 30m: swc-040
  4d, 22h, 32m: swc-039
  4d, 22h, 33m: swc-038
  4d, 22h, 35m: swc-037
  4d, 22h, 37m: swc-036
  4d, 22h, 38m: swc-035
  4d, 22h, 42m: swc-034
  4d, 22h, 49m: swc-033
 11d, 21h, 23m: swc-031
 11d, 21h, 24m: swc-007
 12d, 22h, 30m: bg-mds-1..2,bg-ss-3
 12d, 22h, 31m: bg-ss-1..2
 34d,  5h, 58m: swc-006,swc-008,swc-015
 34d, 15h, 42m: swc-002
 34d, 15h, 52m: swc-003..005
 34d, 16h, 15m: swc-009..010
 34d, 16h, 22m: swc-011..013
 34d, 16h, 38m: swc-014
 34d, 17h, 12m: swc-016
 34d, 21h,  7m: server-1
 34d, 22h, 23m: swc-032
 41d, 22h, 40m: swc-021,swc-024..029
 41d, 22h, 41m: swc-017..020,swc-022..023,swc-030

Kernel Versions in Use:   <-- Kernel version hosts are booted to.

   3.10.0-1127.10.1.el7.x86_64: server-1..2
   3.10.0-1062.12.1.el7.x86_64: bg-mds-1..2, bg-ss-1..3
   3.10.0-1062.9.1.el7.x86_64 : swc-001..016, swc-031..064
   3.10.0-957.27.2.el7.x86_64 : swc-017..030

Disk Info

   server-1 free space: /=(61%,35GB) /export=(92%,707GB) /var=(78%,11GB) /tmp=(100%,16GB) /boot=(82%,735MB) /boot/efi=(96%,246MB) 
   server-2 free space: /=(33%,18GB) /export=(94%,725GB) /var=(77%,11GB) /tmp=(100%,16GB) /boot=(72%,632MB) /boot/efi=(96%,246MB) 
   Latest diskless image: RHEL-   <-- Newest diskless image on server-1 disk.
   RHEL- (diskless root): swc-001..064   <-- Hosts using this diskless image. 
   Lustre mounters: swc-002..006, swc-008..016
   Isan mounters: swc-001..064
   Bgfs mounters: swc-001..016, swc-031, swc-033..035, swc-037..041, swc-043..049, swc-051..053, swc-055..060, swc-062..064 
   /export mounters: bg-mds-1..2, bg-ss-1..3, swc-001..064   <-- Hosts mounting /export/home; all hosts should mount this!

Unreachable Hosts:   <-- Hosts not reachable by other hosts.

lustre-mds-disk, lustre-oss-1..3-disk FROM swc-001..016
pdu-9 FROM server-1..2
[15:56 root@S-1-p python]# /usr/share/applications/kde4/konsole.desktop 

-- JimJacobs - 2020-08-04
Topic revision: r1 - 2020-08-04, JimJacobs
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback