LARD Project Notes, Part 1
- PART ONE INCLUDES:
- OSS install
- IB setup
- Raid sets
- Installation of lustre packages
1 The LARD setup
Component |
Lustre Version |
IP |
IB |
Lustre role |
MDS (heinlein) |
2.4.3 |
10.7.7.126 |
10.7.17.126 |
Server |
OSS1 |
2.4.3 |
10.7.7.125 |
10.7.17.125 |
Server |
OSS2 |
2.4.3 |
10.7.7.124 |
10.7.17.124 |
Server |
Data Mover 1 (konishiki) |
1.8.9 |
10.7.7.123 |
10.7.17.123 |
Client |
Data Mover 2 (akebono) |
1.8.9 |
10.7.7.122 |
10.7.17.122 |
Client |
- Our OSSes consist of separate head and storage nodes
MAKE SURE ALL MACHINES HAVE INFINIBAND CARDS
2.1 Create/edit a config file at /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
BOOTPROTO=static
DHCPCLASS=
ONBOOT=yes
IPADDR=<IP Address>
NETMASK=255.255.255.0
2.2 Create/edit a module at /etc/modprobe.d/lustre.conf
alias ib0 ib_ipoib
alias ib0 ib_umad
alias net-pf-27 ib_sdp
#this setting reflects the fact that the LARD does not use LNET routers
options lnet networks="o2ib0(ib0)"
2.3 Bring the ib interface up
ifup ib0
2.4 Test the interface (NOT INCLUDED IN ANSIBLE PLAYBOOK)
2.4.1 Test that the subnet manager is working
- Try running the commands: ibswitches, ibnodes or ibhosts.
- There may be some ibwarn type errors. These can be ignored.
- ibhosts and ibnodes should return lines like the following per connected client
Ca : 0x0002c903000a8800 ports 1 "MT25408 ConnectX Mellanox Technologies"
- If the above fails then the Subnet Manager isn't running.
2.4.2Test Inifiniband and Storage manager via ipbing
- Pick a destination node and a source node that are connected via your IB route
- On the destination node, get the GUID via
ibstat | grep "Port GUID"
# Should return something like:
# Port GUID: 0x0002c903000cc98f
- Again on the destination node, start ibping server with
ibping -S
- Now on the source node, start ibping
ibping -G <Dst GUID> e.g. ibping -G 0x0002c903000cc98f
# Should see something similar to ping output
2.4.3Test IP over IB (IPoIB) via ping
- On the source and destination nodes make sure the ib_ipoib kernel module is loaded.
lsmod | grep ipoib
- Simply ping the destination IP addr of target IB interface, e.g. ping 10.7.17.126
3 Attach storage to head nodes and build raid sets
3.1 Cabling (NOT INCLUDED IN ANSIBLE PLAYBOOK)
- The advice of K. Scott on cabling was spot-on
I recommend connecting the first (left) port on controller 0 of the head node to the fourth port of the first SIM on the first shelf. Then connecting the first (left) port on controller 1 of the head node to the fourth port of the first SIM on the second shelf. The basic idea is one LSI controller per shelf and only one cable between controller and shelf.
- For now, I've cabled the 2 30-disk shelves to head node lard-oss-1
- We will order the 40 drives needed to give the other 2 shelves 30 drives
3.2 Build Raid Sets (ANSIBLIZED)
3.2.1 Install MegaRAID software
- Playbook uses the storcliraid role to install the software
- Playbook then programmatically builds out the sets and global spares we want
- Keep in mind that shell commands are not idempotent--they will always show up as 'changed' and you will need to inspect the output provided to determine whether the command was a success
3.2.2 Watch background initialization (NOT INCLUDED IN ANSIBLE PLAYBOOK)
- You don't want to do benchmarks while the drives are being initialized, since that is using resources
- Handily, the
show bgi
command provides an estimate of when the init will complete
[root@lard-oss-2 ~]# storcli /c0/v0 show bgi
Controller = 0
Status = Success
Description = None
VD Operation Status :
===================
-------------------------------------------------------
VD Operation Progress% Status Estimated Time Left
-------------------------------------------------------
0 BGI 66 In progress 10 Hours 36 Minutes
-------------------------------------------------------
4 Install Lustre Server Packages
# # # A NOTE ON DISCOVERING RPMs USING rpm -qa # # #
# ssh -qAX lard-oss-1 rpm -qa lustre | sort
lustre-2.4.3-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
VS
# ssh -qAX lard-oss-1 rpm -qa | grep lustre | sort
kernel-2.6.32-358.23.2.el6_lustre.x86_64
kernel-devel-2.6.32-358.23.2.el6_lustre.x86_64
kernel-headers-2.6.32-358.23.2.el6_lustre.x86_64
lustre-2.4.3-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
lustre-client-modules-1.8.9-wc1_2.6.32_573.18.1.el6.x86_64.x86_64
lustre-ldiskfs-4.1.0-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
lustre-modules-2.4.3-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
lustre-osd-ldiskfs-2.4.3-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
5 Install Lustre Client Packages
- If you are building a 1.8.9 client, the kernel and packages are distributed via kickstart
- If you are building another client, these packages will have to be removed.
Proceed to Part 2 (Benchmarks)
Skip to Part 3 (Building Filesystems)
--
JessicaOtey - 2016-03-16