The host status tool can be used to summarize important attributes for all hosts in the system. It is invoked by doing
host-status --all
on one of the server boxes (preferably server-1).
The command is actually a link in
/opt/services/bin
to python/hostStatus.py. The non-servers are configured to mount
server-1:/opt/services/bin
so that all hosts can use the same set of
admin scripts. However, occasionally the NFS mount is failing on the servers; this will cause the command to complain of "file not found" since the script running on server-1 invokes scripts
on the other hosts and these will fail if the
/opt/services/bin
mount is not present.
There is only one version of the script and it runs on all boxes on both clusters. To get a formatted status of only a single host do
host-status the-host-name
. The script uses multitasking to
run the status script on all hosts in parallel so it's fairly fast, although it can take a little longer if hosts are down and timeouts have to be incurred.
Sample Output
[15:52 root@S-1-p python]# host-status --all <-- Only works using root.
Host Uptimes for Production Correlator
--------------------------------------
Unpingable: None <-- Hosts that don't answer pings
Can't ssh: None <-- Hosts that won't allow ssh
4d, 20h, 45m: swc-001 <-- Hosts with uptimes in the same minute.
4d, 21h, 36m: swc-059
4d, 21h, 53m: server-2
4d, 21h, 54m: swc-064
4d, 21h, 55m: swc-063
4d, 21h, 56m: swc-061..062 <-- Range notation used for sequential hosts.
4d, 21h, 57m: swc-060
4d, 21h, 58m: swc-058
4d, 21h, 59m: swc-057
4d, 22h, 0m: swc-056
4d, 22h, 1m: swc-055
4d, 22h, 3m: swc-053..054
4d, 22h, 5m: swc-052
4d, 22h, 10m: swc-051
4d, 22h, 11m: swc-050
4d, 22h, 13m: swc-049
4d, 22h, 14m: swc-048
4d, 22h, 16m: swc-047
4d, 22h, 18m: swc-046
4d, 22h, 20m: swc-045
4d, 22h, 22m: swc-044
4d, 22h, 23m: swc-043
4d, 22h, 27m: swc-042
4d, 22h, 28m: swc-041
4d, 22h, 30m: swc-040
4d, 22h, 32m: swc-039
4d, 22h, 33m: swc-038
4d, 22h, 35m: swc-037
4d, 22h, 37m: swc-036
4d, 22h, 38m: swc-035
4d, 22h, 42m: swc-034
4d, 22h, 49m: swc-033
11d, 21h, 23m: swc-031
11d, 21h, 24m: swc-007
12d, 22h, 30m: bg-mds-1..2,bg-ss-3
12d, 22h, 31m: bg-ss-1..2
34d, 5h, 58m: swc-006,swc-008,swc-015
34d, 15h, 42m: swc-002
34d, 15h, 52m: swc-003..005
34d, 16h, 15m: swc-009..010
34d, 16h, 22m: swc-011..013
34d, 16h, 38m: swc-014
34d, 17h, 12m: swc-016
34d, 21h, 7m: server-1
34d, 22h, 23m: swc-032
41d, 22h, 40m: swc-021,swc-024..029
41d, 22h, 41m: swc-017..020,swc-022..023,swc-030
Kernel Versions in Use: <-- Kernel version hosts are booted to.
-----------------------
3.10.0-1127.10.1.el7.x86_64: server-1..2
3.10.0-1062.12.1.el7.x86_64: bg-mds-1..2, bg-ss-1..3
3.10.0-1062.9.1.el7.x86_64 : swc-001..016, swc-031..064
3.10.0-957.27.2.el7.x86_64 : swc-017..030
Disk Info
---------
server-1 free space: /=(61%,35GB) /export=(92%,707GB) /var=(78%,11GB) /tmp=(100%,16GB) /boot=(82%,735MB) /boot/efi=(96%,246MB)
server-2 free space: /=(33%,18GB) /export=(94%,725GB) /var=(77%,11GB) /tmp=(100%,16GB) /boot=(72%,632MB) /boot/efi=(96%,246MB)
Latest diskless image: RHEL-7.6.0.6 <-- Newest diskless image on server-1 disk.
RHEL-7.6.0.6 (diskless root): swc-001..064 <-- Hosts using this diskless image.
Lustre mounters: swc-002..006, swc-008..016
Isan mounters: swc-001..064
Bgfs mounters: swc-001..016, swc-031, swc-033..035, swc-037..041, swc-043..049, swc-051..053, swc-055..060, swc-062..064
/export mounters: bg-mds-1..2, bg-ss-1..3, swc-001..064 <-- Hosts mounting /export/home; all hosts should mount this!
Unreachable Hosts: <-- Hosts not reachable by other hosts.
------------------
lustre-mds-disk, lustre-oss-1..3-disk FROM swc-001..016
pdu-9 FROM server-1..2
[15:56 root@S-1-p python]# /usr/share/applications/kde4/konsole.desktop
--
JimJacobs - 2020-08-04