100 recent changes in HPC Web retrieved at 14:46 (Local)

WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoDifxOsPatch2. *con...
USNO Maintenance Log 2 This log page allows significant maintenance/administration events to be logged for future use. The goal is to record information that migh...
KVM Console/Switchs (jhj 1/15/21) Currently two of the KVM retained from the old correlator are not on the net. These are the KVM in the second rack of SWCs and...
Environment Monitor The environment monitor is positioned to read the temperature and humidity via a sensor mounted above the rack containing the correlator nodes...
Host SSH Aliases In root's .bashrc are defined a number of aliases to make it a bit quicker to do commands to a single host. Basically it aliases the host name wi...
Ping Diskless The custom python3 utility ping diskless was originally designed to ping the SWCs to watch them through the reboot cycle. It has evolved such that ...
Nagios Nagios is a watchdog daemon that can be configured to check on various properties of a computer system and issue warnings and/or alarms if one of the prope...
Client (SWC) Installation The initial client installation was performed using the hard disk drive that came with SWC 001. The host was booted via PXE boot and the...
BGFS Installation The BGFS hosts were installed by vendor. They were then modified by us to meet the most pressing requirements (i.e., "critical" and "high" find...
Logging Rsyslog Server 1 runs rsyslogd via the systemd rsyslog.service; the main configuration for the rsyslod is /etc/rsyslog.conf. DHCP Logging DHCPD messages...
Time Service Overview Time services is provided primarily by server 1 using the chronyd daemon. The configuration file for chronyd is /etc/chrony.conf. Service...
Contents Networking As of this writing the plan is to fold the new correlator into the existing correlator network layout. This will facilitate keeping the two i...
Main.JimJacobs 2018 12 11
BGFS File Servers Overview The new correlator is using BEEGFS (bgfs) in place of Lustre to provide a high speed, high capacity file system. The bgfs servers are ...
Compute Nodes Basic System Specs The SWC (SoftWare Correlator) hosts are Dell R640 systems. These have two Intel Xeon Gold 6132 CPUs which run at 2.60 GHz; each ...
USNO Correlator Documentation Wiki This Wiki section serves as the primary documentation for the NRAO project to provide a DifX correlator to USNO. It is intentio...
Fringing Host (test cluster only) Overview In addition to the normal correlation work, some other analytical work is done. At the USNO, they have two fringing...
Infiniband Switches Overview The correlators use Infiniband (IB) network connections to pass data between the correlator nodes (SWCs) and also with the file serv...
RADIUS Service RADIUS is an authentication service often supported by simpler devices such as switches. The servers run a systemd service named radiusd that the ...
Admin Servers Introduction Each correlator has two administrative servers, server 1 and server 2. The servers are used to administer the correlator and take no d...
USNO Correlator II Overview Production Correlator The USNO uses a computing cluster built and maintained by NRAO for performing correlations using DiFX. In 2019...
Neo Correlator Patching The patch process for the Neo correlator (installed at USNO on November 2019) is outlined below. A set of "checkboxes" is provided to assi...
AIDE AIDE is a program that scans important files in the system and reports any changes in the files relative to the accepted baseline. The security STIG require...
Shutting off Nagios Alarms When a nagios problem can't be fixed right away it can get tiresome to have the alarm message coming out every hour or so. Almost the s...
17:38 root@bg mds 1 T ~ # yum install mlnx ofed all Loaded plugins: langpacks, product...
Pty Allocation Problem So far, this problem has mostly occurred on server 1 on both clusters, probably because it gets the most login traffic suring maintenance. ...
Debugging SE Linux Both server 1..2 and the BGFS hosts are running SE Linux in targeted mode; the SWCs are not running SE Linux although since the images are host...
Correlator DNS Service Server Configuration Both server 1 and server 2 provide DNS service to the correlator. Server 2 should be considered a backup to server 1....
Mail Service Overview Since the correlator is effectively a compute device, very little email service is required. The only exception is that the system administ...
File Transfer From an NRAO Host The hosts on either correlator can be reached via the NRAO DNS, file transfer can be initiated from an NRAO host in a pretty stra...
Do on hosts This python script is designed to execute a command on a set of cluster hosts. It is invoked via the link do on hosts located in /opt/services/bin wh...
Host Status Tool The host status tool can be used to summarize important attributes for all hosts in the system. It is invoked by doing host status all on one of...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoDifxOsPatch. *cont...
The file ~/.ssh/jjacobs s1t.config This is an instance of a user specific file to create the linkage from server 1p to server 1t. It currently hops to usno serv 1...
Mellanox (Infiniband) Switch Software information * Collected 5/15/20 by Jim Jacobs data m6036 1 standalone: master show version Product name: MLNX OS...
Main.JimJacobs 2020 03 27
Disk Layout Server Disk Partitioning The disk layout is shown below as displayed using lsblk. The two disks are partitioned somewhat in parallel although the fir...
DHCP Overview Server 1 provides DHCP service on the network admin network. The setting for the daemon are in /etc/dhcp/dhcpd.conf. The IP addresses on all of ne...
Main.JimJacobs 2020 01 23
VNC and X Window Connections To launch an X application, make an ssh tunnel and then launch the X app an it'll appear locally. * ssh X J admin@usno serv 1t ...
This rule requires the installation and continuous execution of a host based intrusion detection system. The preferred package is McAfee but apparently something ...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoRhel7Stig2Productio...
Done, though future fiddling might throw it off.
Pending on firewall implementation.
The STIG requires multifactor access for all privleged accounts; it mentions using two government approved mechanisms (one is CAC) as examples. It's possible that...
This will occur as part of setting up the firewall.
The twoservers do not mount any NFS volumes. The SWCs have two mounts to server 1. We'll have to see if this causes any heartburn when it's enabled on the SWCs.
None of the hosts are currently doing any packet forwarding. On the old cluster the firewall had to be modified to allow packet forwarding to support some of the ...
Xwindows is used for remote administration purposes on server 1 and server 2. It is not installed on the SWCs.
The tftp server package is not installed on the SWCs. On the two servers, the service file is located in /etc/systemd/system/tftp.service and this starts the serv...
The two servers need TFTP server functionality to support diskless bootup. The daemon is invoked with a root, /opt/services/tftpboot, which limits its scope. The ...
Set it to 500 on the servers and at the 1000 limit on the SWCs as I'm worried that there might be a lot of traffic on them.
We'll need to tunnel back to the NRAO to get a clock sync. The SWCs will get their time from server 1.
Apply this one last since it'll make setup a lot harder.
FIPS stuff is on hold right now.
This rule should be revisited as part of firewall setup.
The servers provide SMTP service but act as relays so that alarm messages from within the cluster can propagate outside of the cluster to system administrators, e...
The two servers do allow remote messages over TCP. This allows them to capture syslog messages from the SWCs and other devices on the system. Thus they serve as l...
The config files for audit have changed since the STIG. They are now located in /etc/audit/rules.d though I think they're actually "compiled" into /etc/audit/audi...
Auditting is enabled on the systems and will be tweaked per STIG specs about coverage. The system must have high availability, so having a doomsday switch on audi...
There are a couple of other grub.cfg files located under /opt. These are served up to the diskless systems and are not part of the boot process for the server's t...
This appears to be related to FIPS installation, which we're skipping for the time being.
As soon as we do this one we'll lose the ability to use Radius to the switches, etc. Let's hold off for now.
The hard drives on the servers have a separate partition for /var. I'm not sure that this rule applies to the diskless systems since they have no disks.
The aide program is installed on server {1,2} and the swc diskless image. Cron should run it once a day on the servers. An email message will go out to usno admin...
This rule is moot if McAfee is installed and active; otherwise SELinux needs to be enabled and then configured appropriately.
Mandatory multifactor authentication is likely to be problematic for the cluster both for administration and DiFX usage because both actions require easy login to...
Applying this rule will end up requiring console access to complete a reboot which is not appropriate for either a cluster and for a remotely administrated system...
Made mod using polkit scheme to allow passwordless reboot on the swcs creating /etc/polkit 1/rules.d/51 wheel.rules.
This rule appears to need application hwover, the instructions are such that I'm not sure exactly what should be done.
Server 1 only had config file mods as expected. Server 2 has a mods to /etc/NetworkManager/dispatcher.d/20 chrony; I think a system update did this? The SWCs have...
Wiki page to track August 26/27 CHTC site visit to NRAO Date: August 26 27 2019 Locations: SO Auditorium, CV Auditorium(Monday) and ER 245 (Tuesday) Connection In...
WARNING! THIS TOPIC IS GENERATED BY System.ChecklistPlugin PLUGIN. DO NOT EDIT THIS TOPIC (except table data)! Back to the checklist topic UsnoRhel7Stig2. *conte...
System is not currently using the sssd functionality.
This is on hold until the firewall is configured.
Server 1 only nfs mounts usno serv 1. This mount should be removed after the system is read to be put into production.
The x windows server is installed and is need to allow remote administration of the system. This needs to be in the ISSO document.
The commands provided are not appropriate for this system given it's version of RHEL 7.6. Use systemctl status tftp l to see that the tftp daemon is started in a...
Server 1 has tftpd installed and it's used for booting of the diskless hosts (swc xxx). It is set up to only transfer files located below /opt/services/tftpboot. ...
I believe that postfix on server 1 is configured to only relay messages from hosts on 10.1.36.* but it's not using, nor does it even show, the parameter described...
Need to better understand this one.
Activate firewall once stability is achieved.
This rule is tied tightly to NTP whereas RHEL is using chrony. This will take some research, probably. Also we'llhave to find a direct, acceptable official clock ...
Wait until stability.
Wait for stability.
Implement after stability reached.
Do this one last as its annoying during development.
This one terminates network sessions after 10 minutes of inactivity. Leave this until things are stable.
This is about setting up the firewall. It can wait a little bit yet.
There is no expectation that users will receive mail on server 1. It serves as a way to forward mail off of the cluster (e.g., sending notifications back to the s...
Need to "document" that server 1 aggregates logs for the other hosts in the cluster.
DId cp /usr/share/doc/audit 2.8.4/rules/30 stig.rules stig.rules= to put these rules into /etc/audit/rules.d. Then restarted service using service auditd restart....
This can be done but we'll have to find a place to upload them to. Maybe an NRAO site?
See note for RHEL 07 030210.
This requirement seems to be aimed at sending audit messages off the machine. However, server 1 doesn't really have anywhere to send them. This feature is more ap...
The requirement for high availability will need to be documented and some sort of notification upon audit failure will have to be configured.
Removed the installed telnet server package.
This one works pretty much as the directions describe. I suggest copying the linux image (vmlinuz...) and the ram disk image (initramfs...) and /boot/efi/EFI/redh...
Better first search is find / xdev perm 002 type f perm /111 exec ls ld {} \; more since this will only return executable files that are world writable.
Looks like there is no entry to boot from removable media.
Number of topics: 100
Page 1 of 5 Next >

See also: rss-small RSS feed, recent changes with 50, 100, 200, 500, 1000 topics, all changes
Topic revision: r3 - 2009-10-19, CarolynWhite
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback