Infiniband Switches
Overview
The correlators use Infiniband (IB) network connections to pass data between the correlator nodes (SWCs) and also with the file servers. At the AOC there is a single Mellanox M6036 switch to handle all of the high speed data traffic. At the DC site, there are three M6036 switches. The DC correlator uses two high speed networks, one for data (10.1.37) and one for disk access (10.1.38). The file servers and the first sixteen SWCs (swc-001..016) are connected on the disk network while all of the SWCs are connected together on the data network; The first sixteen SWCs (swc-001..016) have two infiniband NICs.
DC Hardware
All of the switches are connected to the admin net (10.1.36):
DNS Name |
IP |
Sites |
Software Info |
Description |
data-m6036-1 |
10.1.36.93 |
All sites |
Software Info |
|
data-m6036-2 |
10.1.36.94 |
DC Only |
Software Info |
|
data-m6036-3 |
10.1.36.95 |
DC Only |
Software Info |
|
DNS Name |
IP |
Sites |
Software Information |
Description |
AOC Hardware
DNS Name |
IP |
Sites |
Software Info |
Description |
data-m6012-1 |
10.1.36.98 |
All sites |
Software Info |
|
DNS Name |
IP |
Sites |
Software Information |
Description |
The name is actually not present in the DNS but is defined in /etc/hosts
.
Access
Access the Mellanox switches by using
switch-logon nraoUserName@switchName
; this will ask for the radius password.
The command is actually an alias for ssh
with an option to select a suitable key exchange algorithm (by default the FIPS setting on the servers don't support any ones which the switches will accept.)
Problems
The Mellanox NICs on the BGFS systems sometimes "vanish". This can happen when the kernel changes "too much" during patching. A rebuild of the Mellanox drivers must then be performed (the instructions for this are on the [[UsnoDifxOsPatch2#BgfsAtAoc
]["patching" page]]. This could be a problem on the SWCs but has not been to date.
--
JimJacobs - 2020-01-23