As of this writing the plan is to fold the new correlator into the existing correlator network layout. This will facilitate keeping the two in operation side-by-side for a little while until the original correlator is deconstituted. A drawing of the new correlator network can be found here:
The hosts server-1, server-2 and swc-001 are intended to be accessed outside of the correlator subnets. As of this writing (2/14/20) the external NIC of three hosts are connected to the old admin network (10.1.34.*). They are supposed to have their external NICs located on the USNO network, 198.162.24.*, as the three corresponding hosts on the old correlator; however, there appears to be a problem getting this accomplished by the USNO network services group. So for the time being, these three hosts have to be accessed by one of the old correlator hosts.
Correlator Internal Network Structure
Basic operation and administration of the correlator takes place on the various correlator subnets. The production correlator supports three subnets an admin, a data and a disk subnet while the test correlator only has an admin and a data subnet.
Admin Subnet (10.1.36.*)
The 10.1.34 subnet is the administration network and provides access to all of the cluster's networked devices; this includes the two main servers, the compute nodes (the SWCs), the file servers (lustre and BGFS), the PDUs, the KVMs, and the network switches. In addition, some USNO managed devices also reside on the 10.1.34.* network: isan (network storage) and acas (security scanning host). The 10.1.36 network uses 1G ethernet. All network-capable hosts from both the original and neo systems will be connected to the 10.1.34.* subnet to allow for system administration. The list of systems residing on the admin network are show in a table below
and the overall structure can be viewed in the network block diagram
Data Subnet (10.1.37.*)
The data subnet uses 40G Infiniband and serves as a high-speed connection allowing the SWCs to share data. The two admin servers do not have Infiniband NICs and are not on the subnet. On the test correlator, this subnet it not used.
Disk Subnet (10.1.38.*)
The production correlator also has another Infiniband subnet for the use of getting high-speed data from the file servers. Currently, the old Lustre hosts are the main file servers for both test and production clusters. On the test correlator, both disk and inter-SWC communication occurs on this subnet.
Naming and IP Addressing
The names of the various network-resident hosts and the IP addressed assigned them are tabulated for easy reference:
The SWCs use the UEFI boot mechanism and boot off a diskless image served up by server-1. The diskless images reside in
- An SWC uses DHCP to request a DHCP lease as well as boot up service. The configuration file
/etc/dhcp/dhcpd.conf reserves an IP number for each of the SWCS. It also has the name of the system image that the SWC is to use. This has the form of
RHEL-7.9.x.yy. The SWC uses that name to complete the bootup via server-1's
tftp server's files are located in
/opt/services/tftpboot. Those files consist of the files that would normally be in the
/boot and other directories used for disked bootup (e.g., initial ram disk image, kernel, grub.cfg, etc.). The
grub.cfg file specified the location of the system root directory via an NFS mount. As part of the secure boot required by the DoD's RHEL STIG a checksum file for the kernel is expected to be in the system image, colocated with another copy of the kernel file. The checksum file is named ._kernel file name_.hmac (e.g.,
- Eventually, the initial ram FS is replaced with the real system image. These are stored in
/opt/services/diskless_boot and use the same naming convention (i.e., RHEL-7.9.x.yy=).