1 Key points

  • Lustre clients require lustre modules, while MDSes and OSTs only require lnet modules
  • MDSes and OSSes that are specified in /etc/fstab will AUTOMATICALLY trigger lnet modules to be mounted upon mount
  • Fscks (regular, not the 'lustre' kind) and writeconfs need to be performed before the upgrade of any lustre packages

2 Writeconf guidance

2.1 How to do a writeconf

  • Volumes are not mounted during a writeconf operation.
  • On each OST:
tunefs.lustre --ost --writeconf <device>  (e.g., /dev/sda1) 
  • On each MDT:
tunefs.lustre --mdt --writeconf --erase-params --mountfsoptions=errors=remount-ro,user_xattr <device> (e.g., /dev/md127)

2.2 Writeconf consequences

  • A writeconf erases/replaces all configuration parameters that have been previously passed
  • The most efficient way to reinstate them is to keep them in /etc/rc.local

# /etc/rc.local from naaschpc MDT (asimov)
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

#pools
lctl pool_new naaschpc.all
lctl pool_add naaschpc.all OST[1e-5f]

# round-robin defaults (on 2.5 and 2.8)
lctl set_param lod.naaschpc-MDT0000-mdtlov.qos_threshold_rr=17
lctl set_param lod.naaschpc-MDT0000-mdtlov.qos_prio_free=91

#needed or df will hang on clients
lctl conf_param naaschpc.llite.lazystatfs=1

  • Directory assignments to pools will also need to be reestablished
client# lfs find -type d /path/to/lustre/filesystem | while read thing; do lfs setstripe --pool all "$thing"; done 

3 Fsck guidance

3.1 Prepare screens

  • Fsck is best done in screen. Jessica has created several useful functions to make screen more manageable:

alias scls="screen -ls"

reconnect (){
   screen -x $(screen -ls | grep '[0-9].[0-9]' | awk 'NR=="'$1'"{print $1}')
}   

  • A bash one-liner can create a detached screen session for each lustre-formated device (with the device announced in each screen)
  • This one-liner assumes that the lustre file system name only appears in the labels of lustre-formated devices
  • The following examples were used for our two production lustre file systems, naaschpc and cvlustre

# naasc osses
for i in $(ls -l /dev/disk/by-label | grep naaschpc | awk '{sub(/^..\/..\//, "", $11)} {print $11}'); do screen -d -m -S $(echo $(hostname)-$i) bash -c "echo this is a screen for $i; bash"; done

# cv osses
for i in $(ls -l /dev/disk/by-label | grep cvlustre | awk '{sub(/^..\/..\//, "", $11)} {print $11}'); do screen -d -m -S $(echo $(hostname)-$i) bash -c "echo this is a screen for $i; bash"; done

  • View a list of screens with scls
  • You can use reconnect # where # is the position in the screen -ls list to reconnect to a given screen

3.2 Run fsck via a script

  • Here is the script we used

#!/bin/bash

# fsck.sh
# fsck a specified (unmounted) partition
# jotey
# 22 Feb 2017

# # # USAGE FROM TEMPLATE # # #
USAGE_MESSAGE="device letter (e.g., sdb1, sdc)"
ARGMIN=1
ARGMAX=1

usage () {
      printf "usage: %s $USAGE_MESSAGE \n" $0 >&2
      exit
}

#usage test for $ARGMIN
if [ $# -lt $ARGMIN ]; then
   usage
fi

#usage test for $ARGMIN
if [ $# -gt $ARGMAX ]; then
   usage
fi

# # # FUNCTIONS # # #

die (){
   echo "$*" 1>&2
   exit 1
}

# # # MAIN PROGRAM # # #
#accept filesystem letter from command line
spec=$1

# check to see if this device is mounted; fail if so
mount | grep $spec && die "This device is mounted! Unmount first!"

#make a new mountpoint; mount the device there; unmount it
mkdir -p /mnt/device${spec} && mount /dev/${spec} /mnt/device${spec} && umount /mnt/device${spec}

#check the system, but only clean if it isn't clean.
e2fsck -fn /dev/${spec} || e2fsck -fp /dev/${spec}
  • USAGE EXAMPLE: sh fsck.sh sdb1
  • The mounting of the device as a regular (non-lustre) filesystem may fail but the script will continue
  • The purpose of this remount is, according to James, to trigger the journal to replay to make sure it is up to date. "In 99.xx% of the cases it's a no-op, in a tiny fraction it could save a bit of time or anguish."
  • The script will exit if /dev/sdb1 (in this case) is mounted (e2fsck would abort anyway)
  • The script will run a dry run check (-fn) and then, if that does not exit clean (0), it will run the -fp check
  • If you happen to be watching the -fn check and you see it finds issues, you should feel free to abort the script and run -fp manually right away
  • If the -fp check finds things it deems too important to automatically correct, it will abort itself and direct you to run the check manually.
  • You are free to pass the -y flag to answer yes to all questions, but that flag comes with no warranty...
Topic revision: r2 - 2017-03-01, JessicaOtey
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback