As of this writing the plan is to fold the new correlator into the existing correlator network layout. This will facilitate keeping the two in operation side-by-side for a little while until the original correlator is deconstituted. A drawing of the new correlator network can be found here:

External Access

The hosts server-1, server-2 and swc-001 can be accessed from outside of the correlator subnets.

Production Correlator External Names

usno-server-1p-ext -
usno-server-2p-ext -
usno-swc-001p-ext -

Test Correlator 'External' Names and IPs (from within NRAO network)

usno-server-1t-ext -
usno-server-2t-ext -
usno-swc-001t-ext -
usno-bg-mds-1t-ext -
usno-fringe-1-ext -

Correlator Internal Network Structure

Basic operation and administration of the correlator takes place on the various correlator subnets. The production correlator supports three subnets an admin, a data and a disk subnet while the test correlator only has an admin and a data subnet.

Admin Subnet (10.1.36.*)

The 10.1.34 subnet is the administration network and provides access to all of the cluster's networked devices; this includes the two main servers, the compute nodes (the SWCs), the file servers (lustre and BGFS), the PDUs, the KVMs, and the network switches. In addition, some USNO managed devices also reside on the 10.1.34.* network: isan (network storage) and acas (security scanning host). The 10.1.36 network uses 1G ethernet. All network-capable hosts from both the original and neo systems will be connected to the 10.1.34.* subnet to allow for system administration. The list of systems residing on the admin network are show in a table below and the overall structure can be viewed in the network block diagram.

Data Subnet (10.1.37.*)

The data subnet uses 40G Infiniband and serves as a high-speed connection allowing the SWCs to share data. The two admin servers do not have Infiniband NICs and are not on the subnet. On the test correlator, this subnet it not used.

Disk Subnet (10.1.38.*)

The production correlator also has another Infiniband subnet for the use of getting high-speed data from the file servers. Currently, the old Lustre hosts are the main file servers for both test and production clusters. On the test correlator, both disk and inter-SWC communication occurs on this subnet.

Naming and IP Addressing

The names of the various network-resident hosts and the IP addressed assigned them are tabulated for easy reference:

Cluster Boot

The SWCs use the UEFI boot mechanism and boot off a diskless image served up by server-1. The diskless images reside in /opt/services/diskless_boot.

  1. An SWC uses DHCP to request a DHCP lease as well as boot up service. The configuration file /etc/dhcp/dhcpd.conf reserves an IP number for each of the SWCS. It also has the name of the system image that the SWC is to use. This has the form of RHEL-7.9.x.yy. The SWC uses that name to complete the bootup via server-1's tftp server.
  2. The tftp server's files are located in /opt/services/tftpboot. Those files consist of the files that would normally be in the /boot and other directories used for disked bootup (e.g., initial ram disk image, kernel, grub.cfg, etc.). The grub.cfg file specified the location of the system root directory via an NFS mount. As part of the secure boot required by the DoD's RHEL STIG a checksum file for the kernel is expected to be in the system image, colocated with another copy of the kernel file. The checksum file is named ._kernel file name_.hmac (e.g., .vmlinuz-3.10.0-1160.6.1.el7.x86_64.hmac).
  3. Eventually, the initial ram FS is replaced with the real system image. These are stored in /opt/services/diskless_boot and use the same naming convention (i.e., RHEL-7.9.x.yy=).

-- JimJacobs - 2018-11-27
Topic revision: r7 - 2021-08-03, MarkWainright
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback