Lustre Overview
Lustre is a parallel distributed filesystem used in most large scale computing facilities. It allows NRAO/NM desktops, public machines and clusters to share a large file space thus removing the need for repeatedly copying data between systems for processing. It's primarily designed for performance which is achieved by aggregating individual disk throughput across a large number of disks. As a side effect the resulting storage volume is typically large compared to desktop storage. In NRAO/NM's case the current Lustre filesystem is 90TB of storage and capable of sustaining ~4GB/s reads or writes. For similarly designed systems each OSS contributes 30TB of storage and ~1.4GB/s I/O.
The described Lustre configuration is designed to produce maximum throughput and storage volume for minimal money. The cost per node is only 60% greater than the raw cost of disks. It is not a suitable design for high availability of a large number of nodes or large volumes of small I/Os. The configuration attempts to balance disk spindle speed limits (125MB/s per disk), RAID card limits (~500MB/s per card), chassis volume (24 disks) and uniform distribution of data across 2^n data disks such that 1MB I/O's stripe uniformly and network throughput via infiniband (>10Gbit).
The final design consists of 2 OSSes each of which host four OSTs 4+2 RAID 6 arrays (24 total disks). Each RAID group reach spindle and RAID card limits of around 500MB/s raw, 375 to 400MB/s formatted. The 4 OSTs provide ~1.4GB of total I/O which is transmitted via 40Gbit QDR Infiniband to clients.
A schematic type drawing which shows the physical layout of the OSSes and network connectivity to the MDS, post processing and archive can be found here:
lustre-schematic.pdf
Full documentation for Lustre can be found at
http://wiki.lustre.org/index.php/Use:Use
Definitions
OSS: Object Storage Server, consists of 1 or more OSTs, stores actual block data.
OST: Object Storage Target, physical disks, consists of 1 or more disks in a RAID configuration
MDS: Metadata Server. consists of MDT and MGS, stores file metadata (owner, date stamps, permissions etc)
MDT: Metadata Target, physical disk which contains metadata
'>dt>MGS: Message Server, communications server for OSS/MDS/client traffic
-- JamesRobnett - 2011-07-15