Tests using the maximum possible number of subMSs (all existing SPW/scan combinations)

Since CASA data reduction scripts rely very commonly on SPW/scan selection the paralellism is broken when all the selected SPWs/Scans are contained in one single sub-MS.

Therefore, the only generic solution to guarantee that each task call is parallelized consists on creating one sub-MSs per each existing scan/SPW combination.

For this particular case (ALMA cycle 2 data) we have a total of 448 scan/SPW combinations and the purpose of this test is to show the side-problems of this setup

These tests have been conducted at cbt-el6-3, using a local disks (in order to reliably compare results) and 2 servers.

Logs are available here:
  1. Maximum number of open files (running MMS as a monolithic MS)
    • Since each column of the main table is physically stored in a separated file, the total number of files to be open when accessing a MMS is nSubMS*nColumns where nColumns is typically in the range of 31 (depending on the number of DATA columns and hypercubes). For this particular case is 31, so the total number of open files reaches 31*448 = 13888. This is not a problem in itself but this value is out of range commonly used by system administrator (1024). There is a ticket open to track this problem CAS-4860
  2. wvrgcal seg. fault (running MMS as a monolithic MS)
    • We see a seg.fault at the wvrgcal libraries (LibAIR2::ArrayGains::pathRMSAnt ) which is reproducible. We have created CAS-7200 and informed Drik about it. This issue might be particularly complicated to fix since it comes from a 3rd-party maintained SW.
  3. Flagdata listmode performance degradation (running MMS in parallel)
    • Application of online flags in flagdata carries problems already from the sequential version because of the large amount of time needed by MSSelection to parse all the online flags (see CAS-7105). In the parallel case the performance is even more degradated when using several sub-MS (nSubMS >> nServers) because the initialization time is paid nSubMS/nServers times. This is so that even for small lists (thousands of commands) the performance is much worse than the sequential version. In this particular case we have 11190 commands and MSSelection takes 190s to parse them, whilst Flagdata takes 92s to process them, with 448 subMS the paralell run time is (with 2 severs) 3410s. We have created a separated issue to address this initialization problem in the parallelization context ( CAS-7201)
  4. Applycal performance degradation (running MMS in parallel)
    • Also in a similar way as for flagdata when parsing the online flags, applycal has non-trivial initialization times, even with the CORRECTED_DATA column initialized, to arrange the calibration tables and maps. In particular, for the second applycal the initialization time is 46s and the processing time 54s.

-- JustoGonzalez - 2015-01-13
Topic revision: r1 - 2015-01-13, JustoGonzalez
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback