-- SandraCastro - 2016-01-07

7 Jan 2016

Deliverables/milestones/timeline PHASE-1
  1. Document explaining the store manager in CASA: a) how the MS layout in disk; b) which sub-table use which store manager.
  2. VI/VB2 document on how sorting works.
  3. Tile caching

  • Test Cases
    • interesting tasks to look at:
      • mstransform (split2 mode): pure read/write IO, should allow identifying inefficiencies in infrastructure code
      • flagdata: currently cpu bound but parallelizable, to see if IO bound can be archived with enough cores and check influence of parallel IO on efficiency
      • gaincal/applycal: have shown weird behaviour in the past (fsyncs, redundant reads), not VIVB2 based (?)
      • visstat2: almost pure read IO for simple statistics (sum/mean), may allow checking how much cpu work (median/stdev) is required to mask IO inefficiencies
  • use the Table Class directly via small C++ executables: To verify how it behaves in minimal reproducable testcases
  • datasets: For_ants, ALMA, EVLA datasets
    • different selections (scan,spw,antenna,field,baseline,timerange)
    • Create scripts with calls to these tasks to run the tests in different platforms? Could ask Martin to run these tests locally in lustre and we run the tests in the SSD and normal disks here.
Which analysis tools to attach?

local: strace, perf trace, io-profiler, sysstat, (blktrace, seekwatcher to verify whether disk fragmentation messes with results)

lustre: ?

Available test hardware at ESO:

~250mb/s raid5 spinning ext4 NAS on RHEL 6.3

~150mb/s raid0 spinning btrfs on Linux 4.2 (almost full, performance not very reliable but checking effect of transparent compression might be interesting)

~700mb/s OCZ Velodrive SSD ext4 on Linux 4.4

~400mb/s laptop SSDs ext4

split test on corrected m100 data

put regression/alma-m100-analysis/input in cwd and rrun alma-m100-analysis-regression.py

split('m100/X54.ms', 'split.ms') (12G, throughput sagan5 ~ 50mb/s should be around 75-100)

traced with io-profiler (decent strace log parser)

tb.open(file); tb.getdminfo()
  • read: 24_TSM2 (correctedtiles 'CubeShape': array([ 2, 3840, 200200] int32) 1mb reads, 1.2GB file vs 1.6gb read 16200 reads 'DEFAULTTILESHAPE': array([ 2, 385, 171], dtype=int32) TiledShapeStMan
  • feed table.f0: 75629 3kb reads filesize 50kb, removing read() calls via increasing bucketsize -> no effect on performance with local disc, check if relevant on lustre
  • 21_TSM1 (TiledFlag 'CubeShape': array([ 2, 3840, 200200] int32) 38000 16 kb reads 200mb file vs 600mb reads 'DEFAULTTILESHAPE': array([ 2, 60, 1092], dtype=int32) TiledShapeStMan (why so small reads?)
  • 21 Time 'SPEC': {'ActualCacheSize': 1, 'BUCKETSIZE': 32768, 'PERSCACHESIZE': 1}, 'TYPE': 'IncrementalStMan'} 24000 32kb reads 350kb file 800mb read, similar TIME_CENTROID
  • write: f12_TSM2 TiledDATA 'DEFAULTTILESHAPE': array([ 1, 4, 32768], dtype=int32) 'CubeShape': array([ 2, 3840, 200200] 1mb reads, no duplicates (OK)
  • lots of redundant subtable writes (+1k on 100kb sized files) (syscal, weather, caldevice)
tilestman improvement: https://github.com/casacore/casacore/pull/334
Martin's comments on X54 split results (19 Apr 2016)

Before knowing that Julian created corrected m100 data before using X54, I went ahead and ran split on X54.ms without correction. I did this with the data residing on a Lustre filesystem. Overall throughput was a bit better than 160 MiB /sec. A few things that I noticed (maybe you already know all this)
  • Tile shapes in X54.ms are much more diverse than in split.ms. Tile sizes in split.ms are very close to 1 MiB (or 1/2 or 1/4 that value) quite consistently, whereas X54 has some very odd shapes. In particular, a few of the hypercubes in DATA and FLAG are nowhere close to 1 MiB or a simple fraction thereof.
  • Bucket sizes for tiled storage are about what I would expect, except for FLAG. Bucket sizes there are 1/8 the tile "size", suggesting that the flags are stored as bit fields. In X54.ms, this results in some bucket sizes that are very close to 16 kiB. This also suggests that bucket size determines the size of the I/O requests. I'm not sure how the bucket size is set, but it results in situations like the FLAG hypercubes in split.ms, which appear to have been carefully tiled so that the tile size "should" be close to 1 MiB, but the bucket sizes (and the reads?) are not.
  • Not sure how much any of the above matters; when I run split('split.ms', 'splinter.ms'), the run time is not much changed. It might be interesting to see whether the small 16k reads in FLAG become 128k reads.
Topic revision: r9 - 2016-04-20, MartinPokorny
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback