LARD Project Notes, Part 3
- PART THREE INCLUDES:
- Lustre file system creation
1 Some notes on the order of events
- OS, NIC, and ib interface (covered in parts 1 and 2)
- Lustre packages are installed; if server, a lustre-specific kernel is installed, too (covered in part 2)
- In both of the above cases, the grub needs to be altered and the machine rebooted (that's the step between part 2 and 3)
- Now with the packages and correct kernels, you can load the kernel modules and proceed from there (covered here in part 3)
- Now you can make stuff with lustre: first the MDS, then the OSSes, then any clients
2 Create MDS
2.1 Load your lustre modules
lsmod | grep lustre #will show you if any lustre modules are currently loaded
modprobe lustre #will install the lustre modules included in your packages
2.2 Create a mount point for the mdt
- You should have created a partition during install
- You may also have already mounted it at its decided path
[root@heinlein ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 63G 14G 47G 23% /
tmpfs 32G 52K 32G 1% /dev/shm
/dev/sda3 1.7T 196M 1.6T 1% /export/lard/mdt
- In the case of heinlein, the mirrored partition has already been mounted at /export/lard/mdt
2.3 Create the mdt
- Unmount the partition first!
- Since this is Lustre 2.4, you must specify the index of the mdt. Start counting at zero.
- The creation of the mdt is when you decide on the name of THIS LUSTRE FILE SYSTEM
- While the reformat command isn't necessary if this is your first go, it is included here because if you need to redo, it does need to be ther
[root@heinlein /]# umount /export/lard/mdt
[root@heinlein /]# mkfs.lustre --reformat --index=0 --fsname=lard --mdt --mgs /dev/sda3
Permanent disk data:
Target: lard:MDT0000
Index: 0
Lustre FS: lard
Mount type: ldiskfs
Flags: 0x65
(MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
device size = 1775615MB
formatting backing filesystem ldiskfs on /dev/sda3
target name lard:MDT0000
4k blocks 454557440
options -J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lard:MDT0000 -J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sda3 454557440
Writing CONFIGS/mountdata
2.4 Mount the lustre filesystem
- mount -t lustre /$device /$mount_point
mount -t lustre /dev/sda3 /export/lard/mdt
2.5 Check your work
2.5.1 df -h
[root@heinlein ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 63G 14G 47G 23% /
tmpfs 32G 52K 32G 1% /dev/shm
/dev/sda3 791G 197M 751G 1% /export/home/heinlein
cvfiler:/vol/vol0/local/redhat
70G 54G 17G 77% /home/rhlocal
/dev/md0 699G 478M 652G 1% /export/lard/mdt
2.5.2 lctl dl
[root@heinlein ~]# lctl dl
0 UP osd-ldiskfs lard-MDT0000-osd lard-MDT0000-osd_UUID 8
1 UP mgs MGS MGS 5
2 UP mgc MGC10.7.17.126@o2ib 6eed27af-0380-5ddf-a0bf-19012244cfd7 5
3 UP mds MDS MDS_uuid 3
4 UP lod lard-MDT0000-mdtlov lard-MDT0000-mdtlov_UUID 4
5 UP mdt lard-MDT0000 lard-MDT0000_UUID 3
6 UP mdd lard-MDD0000 lard-MDD0000_UUID 4
7 UP qmt lard-QMT0000 lard-QMT0000_UUID 4
8 UP lwp lard-MDT0000-lwp-MDT0000 lard-MDT0000-lwp-MDT0000_UUID 5
2.6 Edit /etc/fstab to add the mdt
- Find the UUID of the lustre filesystem
tune2fs -l /dev/sda3 | grep UUID
- add the line
UUID=$(result of above command) /export/lard/mdt lustre defaults 0 0
3 Create OSSes/OSTs
3.1 Check in with your devices
[root@lard-oss-2 Lustre]# lsscsi | grep sd*
[0:0:8:0] enclosu QUANTA JB9 SIM 0 1030 -
[0:2:0:0] disk AVAGO MR9380-8e 4.60 /dev/sda
[0:2:1:0] disk AVAGO MR9380-8e 4.60 /dev/sdb
[0:2:2:0] disk AVAGO MR9380-8e 4.60 /dev/sdc
[1:0:47:0] enclosu QUANTA JB9 SIM 0 1030 -
[1:2:0:0] disk AVAGO MR9380-8e 4.60 /dev/sdd
[1:2:1:0] disk AVAGO MR9380-8e 4.60 /dev/sde
[1:2:2:0] disk AVAGO MR9380-8e 4.60 /dev/sdf
[2:0:0:0] disk ATA INTEL SSDSC2BB08 G201 /dev/sdg
[2:0:1:0] disk ATA INTEL SSDSC2BB08 G201 /dev/sdh
- There is no need in Lustre 2.x to use partitions!
3.2 Make Lustre file system with mkfs (on raw devices)
- Note that the reformat command is included because we've rebuilt these a bunch of times. Technically if it is a fresh mkfs, that command is not needed. But having it in there doesn't hurt the first time, either.
# this is one line from mkfs_lustre.sh
# each device from above needs its own file system
mkfs.lustre --index=0 --reformat --fsname=lard --mkfsoptions="-E stride=32,stripe-width=224 -m0" --ost --mgsnode=10.7.17.126@o2ib0 /dev/sda
Permanent disk data:
Target: lard:OST0000
Index: 0
Lustre FS: lard
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=10.7.17.126@o2ib
device size = 53412576MB
formatting backing filesystem ldiskfs on /dev/sda
target name lard:OST0000
4k blocks 13673619456
options -m0 -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,64bit,flex_bg -G 256 -E stride=32,stripe-width=224,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lard:OST0000 -m0 -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,64bit,flex_bg -G 256 -E stride=32,stripe-width=224,lazy_journal_init -F /dev/sda 13673619456
Writing CONFIGS/mountdata
- One funny thing: When it makes the file system, the target name has a colon not a hyphen. This will CHANGE WHEN THE FS IS MOUNTED. Probably a bug.
3.3 Mounting the OSTs (first time and forever)
3.3.1 Create mount points
# this is mk_lustre_mountpoints.sh
# each ost needs a mount point
mkdir -p /export/lard/ost1-1
mkdir -p /export/lard/ost1-2
mkdir -p /export/lard/ost1-3
mkdir -p /export/lard/ost1-4
mkdir -p /export/lard/ost1-5
mkdir -p /export/lard/ost1-6
3.3.2 First time mounting
# this is mount_lustre_first_time.sh
# each ost is mounted at its mount point
mount -t lustre /dev/sda /export/lard/ost1-1
mount -t lustre /dev/sdb /export/lard/ost1-2
mount -t lustre /dev/sdc /export/lard/ost1-3
mount -t lustre /dev/sdd /export/lard/ost1-4
mount -t lustre /dev/sde /export/lard/ost1-5
mount -t lustre /dev/sdf /export/lard/ost1-6
3.3.3 Get the OST labels and add them to /etc/fstab
# this is ost_fstab_labels.sh
# each ost has a label
tune2fs -l /dev/sda | grep name:
tune2fs -l /dev/sdb | grep name:
tune2fs -l /dev/sdc | grep name:
tune2fs -l /dev/sdd | grep name:
tune2fs -l /dev/sde | grep name:
tune2fs -l /dev/sdf | grep name:
# Lustre
LABEL=lard-OST0000 /export/lard/ost1-1 lustre defaults 0 0
3.3.4 Unmount and remount using fstab
# sample for 1 OST
[root@lard-oss-1 Lustre]# umount /export/lard/ost1-1
[root@lard-oss-1 Lustre]# mount -a
3.4 Tune Lustre OSTs
add to /etc/rc.local
# Lustre
# one line per device
/sbin/blockdev --setra 16384 /dev/sda
/sbin/blockdev --setra 16384 /dev/sdb
/sbin/blockdev --setra 16384 /dev/sdc
/sbin/blockdev --setra 16384 /dev/sdd
/sbin/blockdev --setra 16384 /dev/sde
/sbin/blockdev --setra 16384 /dev/sdf
lctl set_param obdfilter.*.readcache_max_filesize=6M
- then run it:
sh /etc/rc.local
4.1 Reboots required
- The kernel is patchless (i.e., it doesn't say lustre, but it needs to match your lustre package version)
- Also, a reboot is required AFTER lustre-client and lustre-client-modules installation!
4.2 Load lustre kernel modules
4.3 Mount the lustre client
4.3.1 Make mountpoint
4.3.2 Mount -t
-
mount -t lustre $mds-ip@$mds-interface:/$filesystem-name -o user_xattr,flock /$mountpoint
variable |
meaning |
our value |
$mds-ip |
IP or hostname of the MDS |
10.7.17.126 |
$mds-interface |
type of interface |
o2ib (for Infiniband) |
$filesystem-name |
name of the lustre filesystem given when the MDS was created |
lard |
$mountpoint |
actual mountpoint on the client |
/lard |
#all together now!
mount -t lustre 10.7.17.126@o2ib:/lard -o user_xattr,flock /lard
5 Questions
5.1 What role does the nrao-lustre script play on a client box?
* grab script: scp -p hex:/etc/init.d/nrao-lustre /etc/init.d/
* do ==chkconfig --add nrao-lustre==
* run the script to start the client
5.2 Automounter
- I'll need an explanation on how this works...