WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoDifxOsPatch2. *con...
Contents Networking As of this writing the plan is to fold the new correlator into the existing correlator network layout. This will facilitate keeping the two i...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoDifxPatch2Draft. *...
USNO Maintenance Log 2 This log page allows significant maintenance/administration events to be logged for future use. The goal is to record information that migh...
KVM Console/Switchs (jhj 1/15/21) Currently two of the KVM retained from the old correlator are not on the net. These are the KVM in the second rack of SWCs and...
Environment Monitor The environment monitor is positioned to read the temperature and humidity via a sensor mounted above the rack containing the correlator nodes...
Host SSH Aliases In root's .bashrc are defined a number of aliases to make it a bit quicker to do commands to a single host. Basically it aliases the host name wi...
Ping Diskless The custom python3 utility ping diskless was originally designed to ping the SWCs to watch them through the reboot cycle. It has evolved such that ...
Nagios Nagios is a watchdog daemon that can be configured to check on various properties of a computer system and issue warnings and/or alarms if one of the prope...
Client (SWC) Installation The initial client installation was performed using the hard disk drive that came with SWC 001. The host was booted via PXE boot and the...
BGFS Installation The BGFS hosts were installed by vendor. They were then modified by us to meet the most pressing requirements (i.e., "critical" and "high" find...
Logging Rsyslog Server 1 runs rsyslogd via the systemd rsyslog.service; the main configuration for the rsyslod is /etc/rsyslog.conf. DHCP Logging DHCPD messages...
Time Service Overview Time services is provided primarily by server 1 using the chronyd daemon. The configuration file for chronyd is /etc/chrony.conf. Service...
BGFS File Servers Overview The new correlator is using BEEGFS (bgfs) in place of Lustre to provide a high speed, high capacity file system. The bgfs servers are ...
Compute Nodes Basic System Specs The SWC (SoftWare Correlator) hosts are Dell R640 systems. These have two Intel Xeon Gold 6132 CPUs which run at 2.60 GHz; each ...
USNO Correlator Documentation Wiki This Wiki section serves as the primary documentation for the NRAO project to provide a DifX correlator to USNO. It is intentio...
Fringing Host (test cluster only) Overview In addition to the normal correlation work, some other analytical work is done. At the USNO, they have two fringing...
Infiniband Switches Overview The correlators use Infiniband (IB) network connections to pass data between the correlator nodes (SWCs) and also with the file serv...
RADIUS Service RADIUS is an authentication service often supported by simpler devices such as switches. The servers run a systemd service named radiusd that the ...
Admin Servers Introduction Each correlator has two administrative servers, server 1 and server 2. The servers are used to administer the correlator and take no d...
USNO Correlator II Overview Production Correlator The USNO uses a computing cluster built and maintained by NRAO for performing correlations using DiFX. In 2019...
Neo Correlator Patching The patch process for the Neo correlator (installed at USNO on November 2019) is outlined below. A set of "checkboxes" is provided to assi...
AIDE AIDE is a program that scans important files in the system and reports any changes in the files relative to the accepted baseline. The security STIG require...
Shutting off Nagios Alarms When a nagios problem can't be fixed right away it can get tiresome to have the alarm message coming out every hour or so. Almost the s...
Pty Allocation Problem So far, this problem has mostly occurred on server 1 on both clusters, probably because it gets the most login traffic suring maintenance. ...
Debugging SE Linux Both server 1..2 and the BGFS hosts are running SE Linux in targeted mode; the SWCs are not running SE Linux although since the images are host...
Correlator DNS Service Server Configuration Both server 1 and server 2 provide DNS service to the correlator. Server 2 should be considered a backup to server 1....
Mail Service Overview Since the correlator is effectively a compute device, very little email service is required. The only exception is that the system administ...
File Transfer From an NRAO Host The hosts on either correlator can be reached via the NRAO DNS, file transfer can be initiated from an NRAO host in a pretty stra...
Do on hosts This python script is designed to execute a command on a set of cluster hosts. It is invoked via the link do on hosts located in /opt/services/bin wh...
Host Status Tool The host status tool can be used to summarize important attributes for all hosts in the system. It is invoked by doing host status all on one of...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoDifxOsPatch. *cont...
The file ~/.ssh/jjacobs s1t.config This is an instance of a user specific file to create the linkage from server 1p to server 1t. It currently hops to usno serv 1...
Mellanox (Infiniband) Switch Software information * Collected 5/15/20 by Jim Jacobs data m6036 1 standalone: master show version Product name: MLNX OS...
Disk Layout Server Disk Partitioning The disk layout is shown below as displayed using lsblk. The two disks are partitioned somewhat in parallel although the fir...
DHCP Overview Server 1 provides DHCP service on the network admin network. The setting for the daemon are in /etc/dhcp/dhcpd.conf. The IP addresses on all of ne...
VNC and X Window Connections To launch an X application, make an ssh tunnel and then launch the X app an it'll appear locally. * ssh X J admin@usno serv 1t ...
This rule requires the installation and continuous execution of a host based intrusion detection system. The preferred package is McAfee but apparently something ...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoRhel7Stig2Productio...
The STIG requires multifactor access for all privleged accounts; it mentions using two government approved mechanisms (one is CAC) as examples. It's possible that...
The twoservers do not mount any NFS volumes. The SWCs have two mounts to server 1. We'll have to see if this causes any heartburn when it's enabled on the SWCs.
None of the hosts are currently doing any packet forwarding. On the old cluster the firewall had to be modified to allow packet forwarding to support some of the ...
The tftp server package is not installed on the SWCs. On the two servers, the service file is located in /etc/systemd/system/tftp.service and this starts the serv...
The two servers need TFTP server functionality to support diskless bootup. The daemon is invoked with a root, /opt/services/tftpboot, which limits its scope. The ...
The servers provide SMTP service but act as relays so that alarm messages from within the cluster can propagate outside of the cluster to system administrators, e...
The two servers do allow remote messages over TCP. This allows them to capture syslog messages from the SWCs and other devices on the system. Thus they serve as l...
The config files for audit have changed since the STIG. They are now located in /etc/audit/rules.d though I think they're actually "compiled" into /etc/audit/audi...
Auditting is enabled on the systems and will be tweaked per STIG specs about coverage. The system must have high availability, so having a doomsday switch on audi...
There are a couple of other grub.cfg files located under /opt. These are served up to the diskless systems and are not part of the boot process for the server's t...
The hard drives on the servers have a separate partition for /var. I'm not sure that this rule applies to the diskless systems since they have no disks.
The aide program is installed on server {1,2} and the swc diskless image. Cron should run it once a day on the servers. An email message will go out to usno admin...
Mandatory multifactor authentication is likely to be problematic for the cluster both for administration and DiFX usage because both actions require easy login to...
Applying this rule will end up requiring console access to complete a reboot which is not appropriate for either a cluster and for a remotely administrated system...
Server 1 only had config file mods as expected. Server 2 has a mods to /etc/NetworkManager/dispatcher.d/20 chrony; I think a system update did this? The SWCs have...
Wiki page to track August 26/27 CHTC site visit to NRAO Date: August 26 27 2019 Locations: SO Auditorium, CV Auditorium(Monday) and ER 245 (Tuesday) Connection In...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoRhel7Stig2. *conte...
The commands provided are not appropriate for this system given it's version of RHEL 7.6. Use systemctl status tftp l to see that the tftp daemon is started in a...
Server 1 has tftpd installed and it's used for booting of the diskless hosts (swc xxx). It is set up to only transfer files located below /opt/services/tftpboot. ...
I believe that postfix on server 1 is configured to only relay messages from hosts on 10.1.36.* but it's not using, nor does it even show, the parameter described...
This rule is tied tightly to NTP whereas RHEL is using chrony. This will take some research, probably. Also we'llhave to find a direct, acceptable official clock ...
There is no expectation that users will receive mail on server 1. It serves as a way to forward mail off of the cluster (e.g., sending notifications back to the s...
DId cp /usr/share/doc/audit 2.8.4/rules/30 stig.rules stig.rules= to put these rules into /etc/audit/rules.d. Then restarted service using service auditd restart....
This requirement seems to be aimed at sending audit messages off the machine. However, server 1 doesn't really have anywhere to send them. This feature is more ap...
This one works pretty much as the directions describe. I suggest copying the linux image (vmlinuz...) and the ram disk image (initramfs...) and /boot/efi/EFI/redh...
Better first search is find / xdev perm 002 type f perm /111 exec ls ld {} \; more since this will only return executable files that are world writable.
The checking logic provided seems to prevent any file sharing except by group which is not really consistent with the collaborative nature of this system's usage....
Environment Monitor The environment monitor is positioned to read the temperature and humidity via a sensor mounted above the rack containing the correlator nodes...
usno a7050 show interfaces Ethernet1 is up, line protocol is up (connected) Hardware is Ethernet, address is 001c.7318.2adb (bia 001c.7318.2adb) Description: ...
Points of Contact ALMA Science Tony Remijan aremijan@ ALMA Systems Mike Hatz mhatz@ Background This page establishes a set of benchmarks that can be used...
Correlator Operating System The all the computing hardware will be running a version of RHEL 7.x. Because of security issues, the installed version will track th...
GBO / LBO / JAO / NRAO HPC (High Performance Computing) Support Wiki Introduction This wiki is to track open tasks and to collect detailed logs for HPC relate...
This is a wiki home for high performance computing at the NRAO. The area is still undergoing organizational changes but content in the Categories section is eith...
Nagios We install service software in /opt/services instead of the normal location. This seperates it from the OS install and allows us to re install and/or upgr...
Connecting to the USNO Using VNC * Login to usno serv 1 ext and start VNC ssh admin@usno serv 1 ext vncserver * Start VNC on local machine vncviewer via a...
Creating a new SSH user Access to the cluster from the outside is only permitted using ssh with a public key. This page describes how to add a new account as wel...
Thoughts on automating Lustre Client module builds. * Keep a copy of the current lustre package under /opt/services/lustre client on usno serv 1t. (Had to inst...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic SmileyTest. context ...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic CheckboxTest. *context...
HERA Librarian software Instructions on configuring the hardware can be found at https://staff.nrao.edu/wiki/bin/view/CIS/Documentation/Herastore01 Installation...
Summary of NAASC Lustre FID in direct Activation Activity This page provides a summary of the activity conducted at NAASC on Fri/Sat April 21/22, 2017. %TWISTY{ ...
LSI MegaRAID Install The MegaRAID software installs /usr/local but we actually want it in /opt/services. So we make a symlink. $ To install the software, be...
Node Boot Mechanism Configure BIOS $ Disable Hyper Threading: Hyper Threading is a very cheap trick to simulate dual CPUs. All it does is create a second ent...
DHCP Install DHCP Retrieve DHCP from http://isc.org/products/DHCP/ The latest release as of2012 06 26is DHCP 4.2.4. Looks like they still can't write a good mak...
Installing DiFX Quick help To get usage instructions: difxbuild h To get in line documentation printed to your terminal: difxbuild d Introduction This DiFX in...
Hi all, SInce last Wednesday (2016 12 07) we at jao started having reports of some jaopost client machines (around 10 in 4 days) that got frozen during Pipeline R...
When you log a troubleshooting request, please remember to include relevant logs: 1 MDS log as an attachment 1 OSS log(s) as an attachment 1 Client log(s...
Description NAASC MDS freezes due to being out of memory. More RAM is installed taking it from 12 GB to 48 GB. On a subsequent reboot, the MDS cannot mount the M...