Lustre Filesystem Creation
Documentation for creating MDS and OSS filesystems. Ensure the following steps have been done before proceeding
- Operating systems installed on MDS and OSSes
- Network, including Infiniband is installed and tested
- Destructive raw performance tests have been completed
- Lustre RPMS have been installed and modules loaded (modprobe lustre)
Following operations should be performed on the MDS
- The MDT (metadata target) stores all metadata for the lustre array.
- Since it's a single point of failure consider making it a software raid 1.
mdadm --create /dev/md0 -l 1 -n 2 -x 0 /dev/sdb /dev/sdc
- Consider selecting a name other than the default of 'lustre' for the filesystem, especially if you plan on having multiple filesystems. The name has an 8 character limit.
- Make and Mount Volumes
- Make MDT Lustre filesytem:
- mkfs.lustre --fsname lustre --mdt --mgs /dev/md0
- --fsname name is the filesystem name
- --mdt creates the metadata target
- --mgs makes the MDS act as message server as well
- Create mount point and mount filesystem. (servers mount everything at /export/lustre)
- mkdir -p /export/lustre/mdt
- mount -t lustre /dev/md0 /export/lustre/mdt
- Add mount to /etc/fstab
- echo "/dev/md0 /export/lustre/mdt lustre defaults 0 0" >> /etc/fstab
Usefull mdadm (software raid) commands
- To fail a disk
mdadm --fail /dev/md0 /dev/sdb
- To add a disk, for example after failing one
mdadm --add /dev/md0 /dev/sdb
- To monitor RAID rebuild process and current status
The following operations should be performed for each OST on each OSS. OSTs are registered in the order first mounted, so create all the filesystems and mount one at a time
on OSS-1 and then all the filesystems on OSS-2, etc. I'd suggest doing them in order from lowest SCSI target to highest such that sdb maps to OST0001, sdc to OOST0002 etc.
- Create Filesystems
- For each OST on each OSS, execute something like the following
mkfs.lustre --fsname name --mkfsoptions="-E stride=16,stripe-width=64 -m 0" --ost --mgsnode=MDSIP@o2ib0 /dev/device
name is the name of the filesystem, preferrably not 'lustre'.
stride= Raid chunk size (64k) / filesystem block size (4k) = 16
stripe-width= stride (16) * number of data disks (4) = 64
MDSIP is the IP address of the MDS server's Infiniband interface
device is each raid array on the OSS: sdb, sdc, etc.
- Create mount points
- Make directories like
X is the number of the OSS server and
Y is the enumerated OST on that server. For instance, /export/lustre/ost1-1 is the first OST on the 1st server, /export/lustre/ost2-3 is the 3rd OST on the 2nd server.
- Add mount points to /etc/fstab
- for example on first OSS
/dev/sdb /export/lustre/ost1-1 lustre defaults 0 0
/dev/sdc /export/lustre/ost1-2 lustre defaults 0 0
/dev/sdd /export/lustre/ost1-3 lustre defaults 0 0
/dev/sde /export/lustre/ost1-4 lustre defaults 0 0