-- SandraCastro - 2015-04-08

Casa HPC/Parallelization meeting agenda/minutes


Thursday [09 April /2015], [ESO Centaurus, C.2.01], [15:00 UT]


How to connect

  • Dial in: +49 89 6834
  • Video connection: Use this one (46105@134.171.42.27).

Attendees:

ESO: Julian, Justo, Sandra

Socorro: Lindsey, James, Jeff, Tak, Sanjay, Kumar

CV: Mark, Andy, Akeem, Ville

Agenda

  1. News:
    • George committed a fix for the problems of tileshape and caching in gaincal. (r32904 and r32921)
    • It looks like we have a fix for the limit in number of open files (CAS-4860). Thanks Julian for testing the MultiFile implementation from Ger. Sandra also re-run the pipeline script that failed in the past and they all pass. BUT, importasdm and concat regression tests do fail with MultiFile as they seem to assume some details on the layout of the MS.
    • Julian released the GIL for plotms again in r32940 (CAS-7383). We should watch the regressions run.
  2. Testing:

Minutes

  • We should expect a fix for the tile caching problem for MMS soon.
  • Jeff reported that NAOJ is having problems related to the limit on the number of open files. They will contact Julian about it.
  • Sanjay reported that there is new code (not committed yet), for full polarization testing.
  • Sandra reported that Brian committed fixes to the pipeline development trunk (r32935), including using the flagdata task instead of the agent flagger tool in the hif.heuristics.findrefant class and making sure that all subtable queries are being closed. He put in some debugging statements and verified that tb.showcache() is blank after the import stage and no MSes are being left open. Regarding CAS-7411, he switched over to split2 as well for the VLA pipeline and committed that as well to the development trunk. James will run a test using these changes.
  • James run the ALMA pipeline on the 3 EBs, each one separated and found no problems. He used 32 subMS and other combinations of numbers, but it seems that 8 is the optimal number.
  • Sandra reported her findings when running the 3 EBs separated in different nodes, simultaneously. Each run used mpi with 14 cores. Running them separated rather than together is more optimal and his heuristics should be done by the pipeline. We will work with Stewart on this in June.
  • Justo raised an idea to run plotms in parallel servers, by adding a hidden parameter to the task which will control this. Again NRAO misunderstood the discussion. Justo presented an idea for discussion, not a ready-to-go implementation plan. In any case, Jeff says that parallelizing plotms is to be done later. Sandra adds that we can live without this for the tests because we switch off the use of the web logs, but the pipelines cannot do that. Plotms run time is one of the dominants in both ALMA and EVLA pipelines.
  • Lindsey asked the purpose of the parallel parameter in tclean. Jeff/Sanjay clarified that it is a parameter for testing only. It will be removed later.
  • Jeff reported that the cube parallelization in imaging is waiting for the creation of a 4.4 branch.
  • Julian reported that the MPI environment setting can be done using the -x option of mpirun (mpicasa). He also said, there is no need to set any LD_LIBRARY_PATH. This generated a discussion on what are the "casa" and "casapy" scripts in the binary packages and source code. Mark will get an email from Julian explain what he knows and will follow up with Darrell on this to clarify to everybody later. We will also write a document explaining how to run parallel CASA in all our current environments (from binaries, from source, etc.). Document on how to use the cluster is up to James to write.

Action Item List

  • Mark to clarify what are the differences between casa and casapy scripts and uses.

Topic revision: r5 - 2015-04-10, SandraCastro
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback