Casa Parallelization Meeting Minutes

Thursday August 07th, DSOC 317, 8:00AM MT

  • Polycom Video: 192.33.117.12##8108
  • Voice: 434-817-6523

Attendees:

  • Socorro: Rob, James, Lindsey, Kumar.
  • Charlottesville: N/A
  • Garching: Justo

  • Sandra, Jim out of the office today. Jeff in Chile.

Discussion

  • Meeting Scope
    • James: Old meeting was largely technical, no process discussion or progress tracking.
    • This meeting agenda is process oriented. James suggests this is valuable and needed, but a technical forum may be valuable and necessary too.
    • Some concern with getting both topics within a 1hr time slot.
    • Some loss of team memory in previous meetings due to delays around the release cycle (1 month before & 1 month after feature freeze).
    • Urgency for restarting this discussion is to ensure that effort allocated in the 4.4 cycle is appropriate to complete HPC activities.

  • Who is already working on, or is needed on, this project? Who has an interest (stakeholders)?
    • Suggest the following people attend meetings: Rob, Justo, Sandra, Michel (technical), Jim, Kumar, Urvashi, Sanjay, Darrell, Stewart Williams (technical), Lindsey, James Robnett.
    • Keep informed: Brian Glendenning, K Scott, Jeff, Mark.
    • Attendance can vary depending on meeting focus (technical vs. process/management).

  • Meeting Time & Frequency
    • Weekly, Thursday @ 8AM MT
      • Allows for NRAO-NM, NRAO-CV and ESO team members to attend.
      • Notionally, alternate weekly between process vs technical discussions.
        • May not need a technical meeting for a while. Set agenda as needed.
      • Rob can discuss HPC topics with NAOJ at Thursday PM NAOJ skype call.
    • James suggests working meetings to develop use cases and architecture at a later date will be helpful.
    • For now, stick with weekly meeting cadence.

  • Goal(s) for Next Iteration
    • A paralellization framework in both release and test builds of CASA that can be reliably used by developers, testers and users.
    • Focus on end-to-end scripted use cases (not interactive use).

  • Current Status & Known Work Remaining
    • ACTION: James and Kumar to update issue chart on the wiki. Issue Chart
    • Environment Management & Infrastructure
      • Simple Cluster
        • Kumar: Simple cluster is usable right now, via the python interface. Is reasonably stable. Pclean has a few issues, some multi-ms concerns.
        • Has improved over the last few years. Two major efforts - First was testing and bug resolution, end-to-end regressions, but there are issues in imaging due to memory limitations. Second iteration was incorporating the MPI framework and elimination of the ipython parallel capabilities. Interfaces with paralleltaskhelper.
        • Notes in the cookbook on the old framework. * MPI framework * Meant to deprecate simple cluster. Parallel_task_helper used if mpi installed, if not, falls back on simpler cluster at the moment.
          • Will want to revisit this. May want to make MPI a requirement and distribute with CASA. Unanimous agreement. (Recorded in decision log). * Justo has a doc describing MPI framework. Available in the SVN tree. (And on HPC main page) * MPI interfaces with torque to determine resources (in NMASC case). * Multi-MS processing * Deserves its own user-facing document. This is on Sandra's list. * All tasks should be MMS aware at this point.
    • Heuristics to automate generation of engines and/or threads based on available resources.
      • 1st order, non-optimal, implementation may be in place. Some research required.
      • Improvement on these heuristics will be a major effort in future cycles.
    • Filler
      • Parallel filler only parallelizes by SPW at the moment.
      • Can currently use lazy filler, followed by partition to create MMS. Is equivalent to a parallel filler, and uses MPI framework this way. This is a working solution for first release.
    • Flagging
      • Parallelized, no major work expected.
    • MSTransform
      • Parallelized, but some work remains on cvel and other tasks.
      • Sandra has additional notes on mstransform related work to be done.(See attachments)
      • Old problems with cvel making a new MS (rather than keeping the MMS structure) have been resolved.
    • Calibration
      • Aplycal is not parallelized, but is parallelizable.
      • However, it will work with the MMS (all MMS aware)
      • Tasks that generate calibration tables are not parallelized.
      • Setjy needs some work. Was looping over SPW. Work in progress.
        • Dominates the VLA pipeline.
    • Imager
      • Old pclean (ipython) works in some cases only. Deprecated.
      • Parallelization in new framework. TBD on delivery as part of refactor.
    • Pipeline
      • No parallelization to date, but it can use a lot of the CASA parallelization transparently.
      • Job executor class could be improved as a 2nd order improvement.
    • User Interface(s)
      • Plotting is a significant portion of pipeline processing time.
      • Should at least consider processing multiple plots in parallel, though each plot is likely to be a serial process.
    • Build System
    • Testing
      • Lots to do!
      • Targeted tests of MMS and mstransform, simple_cluster use in Imager, openmp in flagging/gridding, etc...

  • Anything else you would like to discuss (AOB)
    • Wiki page
      • Documents will be kept here for easy reference.
      • For mature documents in SVN, links to be provided.
      • ACTION: Rob to link to SVN copy of MPI doc.

Deferred Items for a future meeting

  • Topics for Next Meeting:
    • Review critical issues for 4.3 cycle.
    • Process discussion on path forward (see notes below)
    • More detailed review of issues and work remaining for 1st parallelized release.
    • Justo and Sandra to visit NM this fall?
      • Work with James, Kumar, B&T hire, etc, on 1st release issues.
      • Sandra noted that trip would have to be in the November/December time-frame.

  • Path Forward
    • Option 1:
      1. Stakeholder Requirements Capture (High level requirements and constraints)
      2. Concepts
        • Concept of Operation
        • Concept of Deployment
      3. Architectural Design
      4. System Specification (Derived Requirements -Sub-system/module/task level)
      5. Implementation Plan (Iterative & Incremental)
      6. Task List (for 4.3 and 4.4 cycle)
    • Option 2:
      1. Concept of Operation
      2. Concept of Deployment
      3. Task List (for 4.3 and 4.4 cycle)
      4. Testing post release
      5. Detailed planning for next iteration/cycle
        • Requirements Capture
        • Architectural Design
        • System Specification
        • Implementation Plan
        • Task List (Cycle 4.5+)
    • Option 3 (?)

  • plotting needs some consideration (batch plotting, not interactive) - pipeline centric discussion.

Action Item List

Item # Date Opened Description Leads Status Status Notes
01 8/07/14 Update Issue Chart to reflect current status. Kumar, James Open Contributions from others on the team also welcome.
02 8/07/14 Review available documentation (on wiki), in particular the MPI document. Lindsey Open  
03 8/07/14 Update reference doc links to point to MPI doc in SVN Rob Complete  
Topic attachments
I Attachment Action Size Date Who Comment
CASA_Parallelization_Meeting_-_Notes_from_Sandra.pdfpdf CASA_Parallelization_Meeting_-_Notes_from_Sandra.pdf manage 81 K 2014-08-07 - 12:58 RobSelina Notes from Sandra who could not attend in person.
Topic revision: r6 - 2014-08-07, RobSelina
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback