LARD Project Notes, Part 3

  • PART THREE INCLUDES:
    • Lustre file system creation

1 Some notes on the order of events

  • OS, NIC, and ib interface (covered in parts 1 and 2)
  • Lustre packages are installed; if server, a lustre-specific kernel is installed, too (covered in part 2)
  • In both of the above cases, the grub needs to be altered and the machine rebooted (that's the step between part 2 and 3)
  • Now with the packages and correct kernels, you can load the kernel modules and proceed from there (covered here in part 3)
  • Now you can make stuff with lustre: first the MDS, then the OSSes, then any clients

2 Create MDS

Applies to Ansiblized
MDS NO

2.1 Load your lustre modules

lsmod | grep lustre #will show you if any lustre modules are currently loaded
modprobe lustre #will install the lustre modules included in your packages

2.2 Create a mount point for the mdt

  • You should have created a partition during install
  • You may also have already mounted it at its decided path
[root@heinlein ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        63G   14G   47G  23% /
tmpfs            32G   52K   32G   1% /dev/shm
/dev/sda3       1.7T  196M  1.6T   1% /export/lard/mdt
  • In the case of heinlein, the mirrored partition has already been mounted at /export/lard/mdt

2.3 Create the mdt

  • Unmount the partition first!
  • Since this is Lustre 2.4, you must specify the index of the mdt. Start counting at zero.
  • The creation of the mdt is when you decide on the name of THIS LUSTRE FILE SYSTEM
  • While the reformat command isn't necessary if this is your first go, it is included here because if you need to redo, it does need to be ther
[root@heinlein /]# umount /export/lard/mdt
[root@heinlein /]# mkfs.lustre --reformat --index=0 --fsname=lard --mdt --mgs /dev/sda3

Permanent disk data:
Target:     lard:MDT0000
Index:      0
Lustre FS:  lard
Mount type: ldiskfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

device size = 1775615MB
formatting backing filesystem ldiskfs on /dev/sda3
   target name  lard:MDT0000
   4k blocks     454557440
   options        -J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lard:MDT0000  -J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sda3 454557440
Writing CONFIGS/mountdata

2.4 Mount the lustre filesystem

  • mount -t lustre /$device /$mount_point
mount -t lustre /dev/sda3 /export/lard/mdt

2.5 Check your work

2.5.1 df -h

[root@heinlein ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              63G   14G   47G  23% /
tmpfs                  32G   52K   32G   1% /dev/shm
/dev/sda3             791G  197M  751G   1% /export/home/heinlein
cvfiler:/vol/vol0/local/redhat
                       70G   54G   17G  77% /home/rhlocal
/dev/md0              699G  478M  652G   1% /export/lard/mdt

2.5.2 lctl dl

[root@heinlein ~]# lctl dl
  0 UP osd-ldiskfs lard-MDT0000-osd lard-MDT0000-osd_UUID 8
  1 UP mgs MGS MGS 5
  2 UP mgc MGC10.7.17.126@o2ib 6eed27af-0380-5ddf-a0bf-19012244cfd7 5
  3 UP mds MDS MDS_uuid 3
  4 UP lod lard-MDT0000-mdtlov lard-MDT0000-mdtlov_UUID 4
  5 UP mdt lard-MDT0000 lard-MDT0000_UUID 3
  6 UP mdd lard-MDD0000 lard-MDD0000_UUID 4
  7 UP qmt lard-QMT0000 lard-QMT0000_UUID 4
  8 UP lwp lard-MDT0000-lwp-MDT0000 lard-MDT0000-lwp-MDT0000_UUID 5

2.6 Edit /etc/fstab to add the mdt

  • Find the UUID of the lustre filesystem tune2fs -l /dev/sda3 | grep UUID
  • add the line UUID=$(result of above command) /export/lard/mdt lustre defaults 0 0

3 Create OSSes/OSTs

Applies to Ansiblized
OSSes YES See here

3.1 Check in with your devices

[root@lard-oss-2 Lustre]# lsscsi | grep sd*
[0:0:8:0]    enclosu QUANTA   JB9 SIM 0        1030  -       
[0:2:0:0]    disk    AVAGO    MR9380-8e        4.60  /dev/sda 
[0:2:1:0]    disk    AVAGO    MR9380-8e        4.60  /dev/sdb 
[0:2:2:0]    disk    AVAGO    MR9380-8e        4.60  /dev/sdc 
[1:0:47:0]   enclosu QUANTA   JB9 SIM 0        1030  -       
[1:2:0:0]    disk    AVAGO    MR9380-8e        4.60  /dev/sdd 
[1:2:1:0]    disk    AVAGO    MR9380-8e        4.60  /dev/sde 
[1:2:2:0]    disk    AVAGO    MR9380-8e        4.60  /dev/sdf 
[2:0:0:0]    disk    ATA      INTEL SSDSC2BB08 G201  /dev/sdg 
[2:0:1:0]    disk    ATA      INTEL SSDSC2BB08 G201  /dev/sdh 

  • There is no need in Lustre 2.x to use partitions!

3.2 Make Lustre file system with mkfs (on raw devices)

  • Note that the reformat command is included because we've rebuilt these a bunch of times. Technically if it is a fresh mkfs, that command is not needed. But having it in there doesn't hurt the first time, either.
# this is one line from mkfs_lustre.sh
# each device from above needs its own file system
mkfs.lustre --index=0 --reformat --fsname=lard --mkfsoptions="-E stride=32,stripe-width=224 -m0" --ost --mgsnode=10.7.17.126@o2ib0 /dev/sda

   Permanent disk data:
Target:     lard:OST0000
Index:      0
Lustre FS:  lard
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=10.7.17.126@o2ib

device size = 53412576MB
formatting backing filesystem ldiskfs on /dev/sda
   target name  lard:OST0000
   4k blocks     13673619456
   options        -m0 -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,64bit,flex_bg -G 256 -E stride=32,stripe-width=224,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lard:OST0000  -m0 -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,64bit,flex_bg -G 256 -E stride=32,stripe-width=224,lazy_journal_init -F /dev/sda 13673619456
Writing CONFIGS/mountdata

  • One funny thing: When it makes the file system, the target name has a colon not a hyphen. This will CHANGE WHEN THE FS IS MOUNTED. Probably a bug.

3.3 Mounting the OSTs (first time and forever)

3.3.1 Create mount points

# this is mk_lustre_mountpoints.sh
# each ost needs a mount point
mkdir -p /export/lard/ost1-1
mkdir -p /export/lard/ost1-2
mkdir -p /export/lard/ost1-3
mkdir -p /export/lard/ost1-4
mkdir -p /export/lard/ost1-5
mkdir -p /export/lard/ost1-6

3.3.2 First time mounting

# this is mount_lustre_first_time.sh
# each ost is mounted at its mount point
mount -t lustre /dev/sda /export/lard/ost1-1
mount -t lustre /dev/sdb /export/lard/ost1-2
mount -t lustre /dev/sdc /export/lard/ost1-3
mount -t lustre /dev/sdd /export/lard/ost1-4
mount -t lustre /dev/sde /export/lard/ost1-5
mount -t lustre /dev/sdf /export/lard/ost1-6

3.3.3 Get the OST labels and add them to /etc/fstab

# this is ost_fstab_labels.sh
# each ost has a label
tune2fs -l /dev/sda | grep name:
tune2fs -l /dev/sdb | grep name:
tune2fs -l /dev/sdc | grep name:
tune2fs -l /dev/sdd | grep name:
tune2fs -l /dev/sde | grep name:
tune2fs -l /dev/sdf | grep name:

  • Example fstab entry
#   Lustre
LABEL=lard-OST0000   /export/lard/ost1-1   lustre  defaults        0 0

3.3.4 Unmount and remount using fstab

# sample for 1 OST
[root@lard-oss-1 Lustre]# umount /export/lard/ost1-1
[root@lard-oss-1 Lustre]# mount -a

3.4 Tune Lustre OSTs

add to /etc/rc.local
# Lustre
# one line per device
/sbin/blockdev --setra 16384 /dev/sda
/sbin/blockdev --setra 16384 /dev/sdb
/sbin/blockdev --setra 16384 /dev/sdc
/sbin/blockdev --setra 16384 /dev/sdd
/sbin/blockdev --setra 16384 /dev/sde
/sbin/blockdev --setra 16384 /dev/sdf

lctl set_param obdfilter.*.readcache_max_filesize=6M

  • then run it: sh /etc/rc.local

4 Configure client boxes

Applies to Ansiblized
Datamovers NO

4.1 Reboots required

  • The kernel is patchless (i.e., it doesn't say lustre, but it needs to match your lustre package version)
  • Also, a reboot is required AFTER lustre-client and lustre-client-modules installation!

4.2 Load lustre kernel modules

  • modprobe lustre

4.3 Mount the lustre client

4.3.1 Make mountpoint

  • mkdir -p /mount/point

4.3.2 Mount -t

  • mount -t lustre $mds-ip@$mds-interface:/$filesystem-name -o user_xattr,flock /$mountpoint
variable meaning our valueSorted ascending
$mountpoint actual mountpoint on the client /lard
$mds-ip IP or hostname of the MDS 10.7.17.126
$filesystem-name name of the lustre filesystem given when the MDS was created lard
$mds-interface type of interface o2ib (for Infiniband)
#all together now!
mount -t lustre 10.7.17.126@o2ib:/lard -o user_xattr,flock /lard

5 Questions

5.1 What role does the nrao-lustre script play on a client box?

   * grab script: scp -p hex:/etc/init.d/nrao-lustre /etc/init.d/
   * do ==chkconfig --add nrao-lustre==
   * run the script to start the client

5.2 Automounter

  • I'll need an explanation on how this works...
Topic revision: r16 - 2016-04-29, JessicaOtey
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback