Casa Parallelization Meeting Minutes

Thursday August 14th, DSOC 317, 8:00AM MT

  • Polycom Video: 192.33.117.12##8108
  • Voice: 434-817-6523

Attendees:

  • Socorro: Rob, Kumar, Urvashi, Sanjay.
  • Charlottesville: N/A
  • Garching: Sandra.

  • Apologies: Justo on leave.

Discussion

  • Review of Action Items from last meeting (see table below).
    • Kumar updated issue list.
    • Rob fixed MPI doc links.

  • Preferred MPI library for development environment.
    • Likely have to build CASA for one library or another.
      • CMAKE wrappers seem to be MPI library dependent (e.g, mpich vs openmpi)
      • Limitations of support for other MPI libraries/implementations w/o modifying CMAKE wrappers/flags to be more generic. (This would be a maintenance headache.)
    • May be some conflicting requirements given the MPI implementations on existing clusters.
      • RHEL6 ships with MPICH by default.
      • OpenMPI ships with OSX by default (OSX is less important, but may need to be considered.)
      • Many other libraries out there(e.g, MVAPICH).
    • Options
      • Pick a single library and package it with CASA. (Preferred)
      • Could provide multiple binaries compiled against different MPI libraries.
      • Could have power users compile CASA if they insist on using their own library. (Would have to provide the build tools too.)
      • Could start out with single library support for 1st release and consider adding additional library support in the future.
      • ACTION: Rob to help bring to a resolution. Work with James, Justo, Kumar and other interested parties. If we proceed with compiling against a single MPI library, we need to select the preferred one.

  • More detailed review of issues and work remaining for 1st parallelized release.
    • Current Status: Summary Page
    • Additional details from today's discussion:
      • Imager
        • Python level class implements parallelization with SimpleCluster
          • Given Justo's implementation SimpleCluster will auto-detect MPI support (if present) and use mpi4py if available. Should be transparent to the imaging team.
        • Still need heuristics for cube parallelization.
          • KG has worked on the partitioning heuristics with Ger and has a solution. Will be committed to SVN in the near term.
        • Can eventually move some of the python level class to the C++ level.
          • UR expects that this won't be too difficult given work that has already been implemented.
        • Limited transfer across nodes as implemented. Limited to passing a list of parameters. May want to pass a BLOB at a later stage.
      • Imager I/O Partitioning
        • Wide-band imaging uses MS split in time.
          • Use with an MMS may impose a requirement on the VI/VB2 to pass necessary information. Follow up at a subsequent meeting.
        • Long term, the imager does not care how the MS is split into sub-MSs. It will effectively treat the MMS as monolithic and repartition to suit its needs.
          • Could have a performance hit if the Imager partition is very similar to the existing MMS partition since data calls could become serialized.
        • Can generally allow MSTransform or associated tasks to partition the MS as best suits flagging, calibration and other tasks and imager will respond accordingly.
        • An edge case is if a sub-MS is fully flagged, old imager would hang. Should respond more gracefully now, but must be tested.
        • May not need to consider additional axes to partition MS in MSTtransform for imager's sake. The needs of other tasks should guide that discussion.
      • MS Transform
        • MMS structure in MSTransform is Sandra's highest priority. Logging will follow after.
        • Heuristics (as described in Sandra's earlier report) are the majority of remaining work.
        • Adding notices to users when partition will lead to treatment of MMS as a monolithic MS.
        • Continue with heuristics next week, followed by testing and documentation later on.
        • ACTION: Sandra will evaluate if cvel2 could be complete for the 4.3 release.
      • Logging
        • Use of single logger file is a concern for the imager team (in the context of testing) since ordering is not guaranteed.
        • Will revisit this discussion for 1st release to ensure we can use available testing / diagnosis tools with the logs.

  • Process discussion on path forward
    • Can likely capture a task list for 4.3 / 1st parallel release based on current status and clear gaps.
      • Should shoot for 1st release being 4.3 given that it looks achievable (main concern is B&T integration)
    • Can develop a testing plan to evaluate the 1st release, and then do detailed planning for 2nd iteration / cycle.
    • However, there is a tendency to re-litigate decisions, or ambiguity when needs/requirements clash.
      • Need to do requirements capture, at a high level, to help guide these discussions and decisions.
    • Conclusion:
      • Proceed with capturing the task list for 4.3 / 1st release. (ACTION for Rob to start this.)
      • In parallel, start requirements capture and elaboration of other valuable project artifacts: (ACTION for Rob to start this.)
        1. Stakeholder Requirements Capture (High level requirements and constraints)
        2. Concepts
          • Concept of Operation
          • Concept of Deployment
        3. Architectural Design
        4. System Specification (Derived Requirements -Sub-system/module/task level)
        5. Implementation Plan (Iterative & Incremental)

  • Anything else you would like to discuss (AOB)
    • N/A

Deferred Items for a future meeting

08/07/14:
  • OSX support for MPI and OSX testing of MPI framework.
  • Justo and Sandra to visit NM this fall?
    • Work with James, Kumar, B&T hire, etc, on 1st release issues.
    • Sandra noted that trip would have to be in the November/December time-frame.

08/14/14:
  • Logger
    • Single log file & concerns regarding ordering of entries.
  • Imaging
    • Discuss requirement imposed on VI/VB2 to pass required information for MMS processing.
  • MSTransform
    • Revisit status of heirustics implementation.
    • Discuss time separation axis for MMS creation.
    • Revisit inclusion of cvel2 in 4.3 release.

Action Item List

Item # Date Opened Description Leads Status Status Notes
01 8/07/14 Update Issue Chart to reflect current status. James In Progress 8/14/14: Contributions from Kumar incorporated. Justo provided separate notes on tasks outstanding. 8/7/14: Contributions from others on the team also welcome.
02 8/07/14 Review available documentation (on wiki), in particular the MPI document. Lindsey Open  
03 8/07/14 Update reference doc links to point to MPI doc in SVN Rob Closed 8/7/14: Complete.
04 8/14/14 Talk to James, Kumar, Justo and others and bring some resolution to preferred MPI library/implementation issue. Rob In Progress 8/14/14: Have feedback from Justo, James, Kumar and Martin Pokorny. Will document and distribute.
05 8/14/14 Add new wiki pages for requirements capture, task list, and other project artifacts. Update based on recent meetings, then circulate for iteration by others. Rob Open  
06 8/14/14 Evaluate feasibility of completing cvel2 for 4.3 release. Sandra Open  
Topic attachments
I Attachment Action Size Date Who Comment
Next-Steps_-_Justo_-_8-13-14.pdfpdf Next-Steps_-_Justo_-_8-13-14.pdf manage 28 K 2014-08-13 - 11:29 RobSelina Next steps for 1st release (provided by Justo)
Topic revision: r7 - 2014-08-14, RobSelina
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback