#### From cvpost005, Running a modified version of krowe's lnet_test script (modified so you can plug in different nids as command-line args) ### #### No errors, I think this is enough to rule out lnet nonsense... [root@cvpost005 ~]# sh lnet-self-test.sh 10.7.17.132@o2ib 10.7.17.134@o2ib 10.7.17.135@o2ib 10.7.17.132@o2ib cvpost005 10.7.17.134@o2ib cvpost007 10.7.17.135@o2ib cvpost008 SESSION: read_write FEATURES: 0 TIMEOUT: 300 FORCE: No 10.7.17.132@o2ib are added to session 10.7.17.134@o2ib are added to session 10.7.17.135@o2ib are added to session Test was added successfully Test was added successfully Test was added successfully Test was added successfully thisbatch is running now [LNet Rates of servers] [R] Avg: 21002 RPC/s Min: 21002 RPC/s Max: 21002 RPC/s [W] Avg: 23781 RPC/s Min: 23781 RPC/s Max: 23781 RPC/s [LNet Bandwidth of servers] [R] Avg: 3016.25 MB/s Min: 3016.25 MB/s Max: 3016.25 MB/s [W] Avg: 2780.35 MB/s Min: 2780.35 MB/s Max: 2780.35 MB/s [LNet Rates of servers] [R] Avg: 21157 RPC/s Min: 21157 RPC/s Max: 21157 RPC/s [W] Avg: 23958 RPC/s Min: 23958 RPC/s Max: 23958 RPC/s [LNet Bandwidth of servers] [R] Avg: 3035.47 MB/s Min: 3035.47 MB/s Max: 3035.47 MB/s [W] Avg: 2803.87 MB/s Min: 2803.87 MB/s Max: 2803.87 MB/s [LNet Rates of servers] [R] Avg: 20996 RPC/s Min: 20996 RPC/s Max: 20996 RPC/s [W] Avg: 23771 RPC/s Min: 23771 RPC/s Max: 23771 RPC/s [LNet Bandwidth of servers] [R] Avg: 3013.24 MB/s Min: 3013.24 MB/s Max: 3013.24 MB/s [W] Avg: 2778.04 MB/s Min: 2778.04 MB/s Max: 2778.04 MB/s [LNet Rates of servers] [R] Avg: 20980 RPC/s Min: 20980 RPC/s Max: 20980 RPC/s [W] Avg: 23752 RPC/s Min: 23752 RPC/s Max: 23752 RPC/s [LNet Bandwidth of servers] [R] Avg: 3014.84 MB/s Min: 3014.84 MB/s Max: 3014.84 MB/s [W] Avg: 2775.64 MB/s Min: 2775.64 MB/s Max: 2775.64 MB/s [LNet Rates of servers] [R] Avg: 20960 RPC/s Min: 20960 RPC/s Max: 20960 RPC/s [W] Avg: 23732 RPC/s Min: 23732 RPC/s Max: 23732 RPC/s [LNet Bandwidth of servers] [R] Avg: 3011.14 MB/s Min: 3011.14 MB/s Max: 3011.14 MB/s [W] Avg: 2774.24 MB/s Min: 2774.24 MB/s Max: 2774.24 MB/s [LNet Rates of servers] [R] Avg: 21168 RPC/s Min: 21168 RPC/s Max: 21168 RPC/s [W] Avg: 23975 RPC/s Min: 23975 RPC/s Max: 23975 RPC/s [LNet Bandwidth of servers] [R] Avg: 3033.27 MB/s Min: 3033.27 MB/s Max: 3033.27 MB/s [W] Avg: 2809.47 MB/s Min: 2809.47 MB/s Max: 2809.47 MB/s servers: Total 0 error nodes in servers readers: Total 0 error nodes in readers writers: Total 0 error nodes in writers session is ended [root@cvpost005 ~]# sh lnet-self-test.sh 10.7.17.132@o2ib 10.7.17.8@o2ib 10.7.17.16@o2ib 10.7.17.132@o2ib cvpost005 10.7.17.8@o2ib asimov 10.7.17.16@o2ib naasc-oss-1 SESSION: read_write FEATURES: 0 TIMEOUT: 300 FORCE: No 10.7.17.132@o2ib are added to session 10.7.17.8@o2ib are added to session 10.7.17.16@o2ib are added to session Test was added successfully Test was added successfully Test was added successfully Test was added successfully thisbatch is running now [LNet Rates of servers] [R] Avg: 18402 RPC/s Min: 18402 RPC/s Max: 18402 RPC/s [W] Avg: 21218 RPC/s Min: 21218 RPC/s Max: 21218 RPC/s [LNet Bandwidth of servers] [R] Avg: 2607.55 MB/s Min: 2607.55 MB/s Max: 2607.55 MB/s [W] Avg: 2828.83 MB/s Min: 2828.83 MB/s Max: 2828.83 MB/s [LNet Rates of servers] [R] Avg: 18346 RPC/s Min: 18346 RPC/s Max: 18346 RPC/s [W] Avg: 21164 RPC/s Min: 21164 RPC/s Max: 21164 RPC/s [LNet Bandwidth of servers] [R] Avg: 2601.00 MB/s Min: 2601.00 MB/s Max: 2601.00 MB/s [W] Avg: 2822.60 MB/s Min: 2822.60 MB/s Max: 2822.60 MB/s [LNet Rates of servers] [R] Avg: 18377 RPC/s Min: 18377 RPC/s Max: 18377 RPC/s [W] Avg: 21218 RPC/s Min: 21218 RPC/s Max: 21218 RPC/s [LNet Bandwidth of servers] [R] Avg: 2609.61 MB/s Min: 2609.61 MB/s Max: 2609.61 MB/s [W] Avg: 2839.01 MB/s Min: 2839.01 MB/s Max: 2839.01 MB/s [LNet Rates of servers] [R] Avg: 18450 RPC/s Min: 18450 RPC/s Max: 18450 RPC/s [W] Avg: 21286 RPC/s Min: 21286 RPC/s Max: 21286 RPC/s [LNet Bandwidth of servers] [R] Avg: 2602.72 MB/s Min: 2602.72 MB/s Max: 2602.72 MB/s [W] Avg: 2842.42 MB/s Min: 2842.42 MB/s Max: 2842.42 MB/s [LNet Rates of servers] [R] Avg: 18463 RPC/s Min: 18463 RPC/s Max: 18463 RPC/s [W] Avg: 21305 RPC/s Min: 21305 RPC/s Max: 21305 RPC/s [LNet Bandwidth of servers] [R] Avg: 2612.32 MB/s Min: 2612.32 MB/s Max: 2612.32 MB/s [W] Avg: 2840.92 MB/s Min: 2840.92 MB/s Max: 2840.92 MB/s [LNet Rates of servers] [R] Avg: 18336 RPC/s Min: 18336 RPC/s Max: 18336 RPC/s [W] Avg: 21164 RPC/s Min: 21164 RPC/s Max: 21164 RPC/s [LNet Bandwidth of servers] [R] Avg: 2599.50 MB/s Min: 2599.50 MB/s Max: 2599.50 MB/s [W] Avg: 2831.70 MB/s Min: 2831.70 MB/s Max: 2831.70 MB/s servers: Total 0 error nodes in servers readers: Total 0 error nodes in readers writers: Total 0 error nodes in writers session is ended
tune2fs -O dirdata /dev/md127 is issued
I had to reboot asimov to get it to actually unmount the mdt, which did not fill me with joy. Thus when I ran the tune2fs command initially, it prompted me to replay the journal (not surprisingly)... [root@asimov ~]# e2fsck -fp /dev/md127 naaschpc-MDT0000: recovering journal naaschpc-MDT0000: 34650573/61046784 files (0.1% non-contiguous), 14833722/61035136 blocks [root@asimov ~]# tune2fs -O dirdata /dev/md127 tune2fs 1.42.13.wc4 (28-Nov-2015) At this point, we were able to start the oi_scrub. Here's a snapshot of the output (you simply 'watch' the statistics file) Every 15.0s: cat /proc/fs/lustre/osd-ldiskfs/naaschpc-MDT0000/oi_scrub Fri Apr 21 19:41:49 2017 name: OI_scrub magic: 0x4c5fd252 oi_files: 64 status: scanning flags: inconsistent param: time_since_last_completed: 24151988 seconds time_since_latest_start: 2080 seconds time_since_last_checkpoint: 40 seconds latest_start_position: 12 last_checkpoint_position: 23801804 first_failure_position: 15 checked: 13793917 updated: 0 failed: 1 prior_updated: 0 noscrub: 64 igif: 3620871 success_count: 1 run_time: 2080 seconds average_speed: 6631 objects/sec real-time_speed: 6749 objects/sec current_position: 24128150 lf_scanned: 1 lf_reparied: 0 lf_failed: 0 We also tried running an oi_scrub on an OST and it just finishes in a few seconds. So that may not be a thing--or if it is a thing, maybe it's better to do it after the MDT finishes? Who knows....
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.