Installing Infiniband for Lustre


Installing Mellanox switch

  • They do not do DHCP by default so you will need a serial console configured to 8N1 with hardware flow control on. Once connected, the login and password is admin.
    The GB switch required 8N1 9600 and flow control OFF This was using minicom rather than a dumb terminal.
  • Install OFED subnet manager license
    • Get Entitlement key and serial number and enter them at http://license.mellanox.com/ to get the license key
      • Mellanox is hit or miss on shipping Entitlement key, may have to call them to get it.
      • One of our units had the license already installed as well as printed on the back of the plastic tab under the ethernet port. We still had to enable Subnet Manager though.
      • GB unit also had license printed on the back of the tab. Unfortunately the quality and ridiculously small size of printing made it impossible to read accurately. Support were able to sort it out though.
    • Go to switch web interface, login and select Setup -> Licensing. enter key in box and click Add Licences.
    • Enable the subnet manager license (otherwise pinging remote nodes won't work).


Configuring IB on systems

Assign IP addresses within a private network. We use 192.168.1.xxx for ib0 interfaces to provide IP over IB. We use a scheme where:
  • The MDS is 192.168.1.10
  • The OSSes are 192.168.1.11 - 192.168.1.99
  • and clients are 192.168.1.101 - 192.168.1.255

Create ifconfig file
edit /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0 BOOTPROTO=static DHCPCLASS= ONBOOT=yes IPADDR=<IP Address> NETMASK=255.255.0.0

Configure Modules
edit /etc/modprobe.d/lustre.conf
alias ib0 ib_ipoib alias ib0 ib_umad alias net-pf-27 ib_sdp # your TCP routes will change according to your setup; this is a sample from naasc-oss-4 options lnet networks="o2ib0(ib0)" routes="tcp 10.7.17.[11-12]@o2ib0; tcp2 10.7.17.[11-12]@o2ib0" live_router_check_interval=60 dead_router_check_interval=60

Bring interface up with ifup ib0


Testing IB

Test that the subnet manager is working
Try running the commands: ibswitches, ibnodes or ibhosts. There may be some ibwarn type errors. These can be ignored.
ibhosts and ibnodes should return lines like the following per connected client

Ca : 0x0002c903000a8800 ports 1 "MT25408 ConnectX Mellanox Technologies"

If the above fails then the Subnet Manager isn't running.

Test Inifiniband and Storage manager via ipbing

On the destination node, get the GUID via
ibstat | grep "Port GUID"
Should return something like:

Port GUID: 0x0002c903000cc98f

On the destination node, start ibping server
ibping -S
On the source node, start ibping like
ibping -G <Dst GUID> e.g. ibping -G 0x0002c903000cc98f
Should see something similar to ping output

Test IP over IB (IPoIB) via ping
On the source and destination nodes make sure the ib_ipoib kernel module is loaded. Simply ping the destination IP addr of target IB interface, e.g. ping 192.168.1.11

-- JamesRobnett - 2011-07-12
Topic revision: r16 - 2016-03-16, JessicaOtey
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback