Thursday Morning Meeting 7/27/2017

  • DIAL-IN NUMBERS & PASSCODES:
  • IP: 192.33.117.12##8110
  • Phone: (434) 817-6524

Attendance

  • Socorro:
  • CV:
  • Garching:
  • SCO:

News / Meetings / Visitors

Build, Release

  • Process for 5.1
  • Tkinter started crashing systematically with the latest XCode/OSX10.12 combination. Testing tcl/tk release candidate 8.6.7 on a new 10.12. build machine.
  • Tests started taking a lot longer after 5.1.0-38. Before ~500 minutes, now ~640 minutes on EL6.

Verification Testing

  • Andy is on vacation July 26 - Aug 1
  • Firstlook casaguides are now part of Accepted Test List
    • One Error on Self Calibration (RHEL6 Only)

Validation Testing

  • Current validation tally: 166 Under Validation, 83 Ready to Validate. 13 tickets went RtV this week so far. 22 tickets went through Validation to Resolved this week so far.
  • Current Testing Efforts for 5.1 deliverables
    • ALMA Cy5 pipeline
      • many tickets, many people. Remy, Crystal, Todd, BrianM, Amanda, etc.
    • crossover between CASA and PL:
      • CAS-9089 Automatic Generation of Antenna Position Corrections for ALMA -- whose court is this ball in? (marked critical, but I'm not sure this is true)
    • autoboxing
      • blocker to 5.1: CAS-10415 Automasking is significantly slow for cube tclean
        • please see ticket for the issue and mitigation strategy for 5.1. Tak posted a new build last night and Crystal started a new test this morning
    • statwt
      • 26 tickets under validation, 1 ready to validate. BrianM and Steve have done some exercising of the task; bugs found and fixed/being fixed. Still no tester committed to deep dive testing. (Actually only one bug has been reported and that has been fixed - DM)
      • as per yesterday's stakeholder meeting, Claire is needed to weigh in on validation of statwt in 5.1 vs 5.2.
    • parallelization
      • initial validation testing by Sandra's team is revealing bugs already being fixed. Full science validation to be completed post 5.1 release when resources are available.
    • miscellaneous
    • plone (lower priority given short development cycle)
      • SD tasks have been published in 5.1, should show up soon in the 5.1 snapshot

Architecture

HPC

  • Link to Summary of parallel pipeline tests for 5.1: https://open-confluence.nrao.edu/pages/viewpage.action?spaceKey=CASA&title=ALMA+Pipeline+Cycle+5+Testing
  • ALMA Pipeline Parallel tests for 5.1. Summary of testing some datasets so far:
    • Dataset T002 - MOUS: uid://A001/X2f6/X265, NGC_6334_a_07_TE; FDM
      • total run time of pipeline is ~3x faster in parallel
      • tclean (total) is ~8x faster in parallel (using 15 parallel cores)
      • immoments is 23x slower in parallel —> CAS-10428
    • Dataset T004 - MOUS: uid://A001/Xbd4641/X23, AS_209_b_06_TM1
      • total run time of pipeline is ~6x faster in parallel. It goes from 4 days in serial to 19 hours in parallel.
      • tclean(total) is ~14x faster in parallel (using 15 parallel cores)
      • immoments is ~9x times slower in parallel —> CAS-10428
    • Issues found so far and/or existing issues affecting the parallel runs:
      • CAS-10434: Error “SelectData has to be run before defineImage" in parallel tclean for an ALMA pipeline test dataset
      • CAS-10431: ALMA pipeline fails in imageprecheck.py with “No spectral window with ID '25' found"
      • CAS-10428: Various image processing tasks (immoments, imhead, exportfits) run slower in parallel
      • CAS-10413 - ALMA pipeline error when creating WVR phase vs baseline plot
      • CAS-10392 - tclean respects memory limits but pipeline code is more memory hungry
      • CAS-10398: Exportfits takes very long to run on a concatenated image
      • CAS-9357: fluxscale leaves tables in cache
  • We need more datasets to continue the tests. Have asked Remy and Crystal, but received no answer so far
  • We need the credentials to see the weblogs we are posting to https://safe.nrao.edu/alma//PipelineTestResults/ParallelImaging. Have asked Remy, but no answer so far.
  • ALMA small datasets for Bamboo automatic tests. Remy suggested 2 datasets:
    • 2015.1.01084.S uid://A001/X2fa/X1fc. 6.1GB, 1 ASDM. It takes 4h30min in parallel 4 servers with weblogs
    • 2015.1.00665.S uid://A001/X2d8/X2c5, 275MB 2 ASDMs. It takes 2h49m in parallel 2 servers with weblogs; 1h28m40s without weblogs. This could be a candidate to include in Bamboo.
    • What is the maximum allowed/desirable run time for this test in Bamboo?

Development

  • Issues specific to CASA 5.1
    • Autoboxing : Current top issue is the speed of one of the steps. Plan is to evaluate if it is a bug or not, and if not, for the pipeline to turn the feature off for casa 5.1 and move this to 5.2 to be handled as an algorithm development topic. Tak - update ?
    • Kumar is back : CAS-10373 (FIELD table efficiency), CAS-10280 (selectData efficiency), CAS-10317 (mosaic slowdown).
    • tclean divergence upon restart (CAS-10318) : UR debugged and found it was a pipeline error - they closed the ticket.
    • awproject validation + debugging (ongoing by SB).
    • New bug tickets resulting from HPC group's testing of the parallel pipeline.
    • Polarization cal : GM is away, critical tickets into validation stage.
    • Statwt : Any development required in response to validation ? Still waiting ?
    • Two (critical) dev targets not yet started : N-sigma thresholds in tclean (UR) + ephemeris table support in imager (KG).
  • Other ongoing development :
    • Async I/O development : Where are we w.r.to doing a first test with visstat to see if it makes a difference ?
    • Filler cleanup : Update from Bob ?
    • plotms : Dealing with bug fixes. plotcal to move to 5.2.
  • CASA 5.2 planning
    • Got initial input from Juergen (from stakeholders). Waiting for VLASS list. It appears that most desired features are in calibration and imaging, plus parallelization.
    • Planning for some coherent internal focus on parallelization, mem/use and performance (details later - need to discuss with a couple more folks)
    • Please can all developers send me info about what work you already have lined up (or wish to do) for 5.2 ( new development / cleanup tasks and expected leftovers from 5.1 tickets currently under validation ) ? Thanks !
  • JIRA: FYI, when you "Hold" a ticket it is now REOPENED instead of UNSCHEDULED
  • Plone:

Pipeline

  • Major Cycle 5 pipeline development has been "completed" including
    • refactoring targets: importdata / exportdata / restore data, deterministic flagging - all pipelines; imaging - all pipelines
    • flagging target : 2 new corrected / model data based flagging tasks ; new rflag based heuristics for VLASS
    • calibration targets: polarization calibration - VLASS; improved low SNR based spw mapping heuristics for ALMA
    • imaging targets: autoboxing and HPC support - ALMA; new imaging workflow and heuristics - VLASS
  • Several display targets: plotting and web log improvements, warning and error handling improvements, remain to be completed
  • Still some concern over the impact of the ALMA Cycle 5 lifecycle changes on pipeline
  • Intensive PWG testing and developer bug fixing
  • Vacations will impact ticket cleanup / testing
  • More ALMA test data expected next week

AOB

  • Morgan on vacation for two weeks, Anand to chair meetings?

Developer Reports

Thursday Meeting
  • Sanjay Bhatnagar
  • Sandra Castro
  • Lindsey Davis
    • Fixed ALMA spectral scan bandpass observation handling
    • Improved representative source displays
    • Improved spw mapping QA scoring
    • Numerous meetings and discussions
  • Bjorn Emonts
  • Pam Ford
    • Responded to validation reports for various 5.0 and 5.1 tickets. Guiding them through the workflow with master merges, checking test results, and pull requests.
    • CAS-9053: further work on atmospheric and Tsky overlays for MeasurementSets.
  • Enrique Garcia
    • CAS-10013: Integrate datacolumn parameter from StatWtTVI in mstransform. The integration should be almost finished. However there is an issue with the internal checking of the TVI for the existence of the data column. It should go the the previous TVI layer, but this is not yet supported by the Vi/VB2 framework. Additionally the results of mstransform and statwt2 do not match (further investigation needed).
    • Started investigating if the memory limits specified in .casarc/env is honored by casa tasks (in particular plotms).
  • Bob Garwood
    • CAS-10028 casa ASDM filler fails when an ephemeris source has a slash in its name. The fix is trivial, the discussions continue - seems likely to lead to new work on better integration of ephemeris information into the MS.
    • Worked on the ALMA ticket to synchronize the shared ALMA/CASA code with CASA changes, also eliminates all of the compiler warnings. This is nearly done. This is immediately upstream of CAS-10278, to synchronize the code on the CASA side.
    • CAS-10289 - importasdm: imported MS has duplicate rows. Close because the duplicate data is in the ASDM. The discussion continues in a PRTSPR ticket. This is apparently a feature of WVR data (that's the only case identified so far). I think the discussion is leaning towards this not (significantly) impacting downstream use, but it's not clear yet. And it's not clear if the filler will somehow need to not fill and/or flag this type of duplicate dagta. * Working with Rachel Rosen to add new tables to the ASDM. We've (re)discovered that there's a hard limit to the size of the model in the current interface of the model code to IDL (corba). We may have a short term workaround. It's clear that some significant work will be needed to re-architect this interface. This workaround, if it works, will delay the implementation of these ASDM changes until the October ACS release (ALMA) and so these changes will not appear in CASA until 5.2. * Discussions on the impacts of the changes to the ASDM Flag table to CASA. Uncovered some unused code related to an apparent dead-end in the ASDM, which is ultimately where we think we can buy time to workaround the limit we've hit in adding things to the data model.
  • Kumar Golap
  • Jeff Kern
  • David Mehringer
  • George Moellenbrock
  • Dirk Petry
  • Martin Pokorny
    • CAS-10362: segfault in my recent vi/vb2 changes, done
    • CAS-9332: visstat2 promotion, in verification
    • CAS-10420: copyable VisibilityIteratorImpl2, awaiting changes to casacore
  • Federico M Pouzols
    • Note new issue in parallel tclean affecting pipeline CAS-10434.
    • Note slowness of im* tasks in parallel (with concatenated images), immoments, imhead, exportfits: CAS-10428
    • Regression test ALMA Titan ephemeris fixed and doc CAS-10433
    • MPICasa in cycle5 pipeline scripts, running with PPR. Crude proof of concept seems to work in CV cluster CAS-10416
  • Urvashi Rao
    • CARTA+Imager : Resurrected the imager side of our old interactivity prototype, pushed changes to branch so that Darrell can look at the connectivity to the viewer.
    • CAS-10318 : divergence upon tclean restart : blocker ticket from ALMA pipeline : Debugged to find it was a pipeline error in which parameters changed between the iter0 and iter1 runs.
    • Numerous discussions and planning ( new ASDM tables w.r.to the CASA filler, new syntax in Flag.xml, autoboxing , parallel pipeline tests, 5.2 HPC targets and coordination between Sandra et al, James, etc. )
    • Began work on CAS-9506 (N-sigma thresholds) and debugging CAS-10434 (parallel cube bug found by HPC group)
  • Darrell Schiebel
  • Ville Suoranta
    • CAS-10313
    • A variety pack of Bamboo/Git tasks
  • Takahiro Tsutsumi
    • Looking into automoask (binary dilation) slowness
    • Perley-Butler 2017 (created a CASA table)
Friday NAOJ Meeting
  • Kanako Sugimoto
  • Wataru Kawasaki
  • Masaya Kuniyoshi
  • Takeshi Nakazato
  • Renaud Miel

Notes (by Dave)

Attendees: Kumar, Dave Tak, Morgan, Juergen, Pam, Lindsey, Urvashi, Sandra, Ville, Akeem, Anand, Bjorn, Jen, Enrique, Federico, Dirk, Martin, Bob

News

  • Interview for CASA lead on Friday, Decision next Wednesday, then negotiations.
  • Data reduction workship Socorro in Oct
  • 2017 Users Report is out (meeting in June). Two paragraphs CASA. Pleased CASA is improving, annoyed different versions have to be used for alma data, differentcycles require different versions of casa. Disappointed no improvements have been made to viewer (but aware carta is coming). Stressed documentation is important

Pipeline

  • "Completed" functional development
  • Lots of people will be on vacation around testing time
  • New flagging tasks going in.
  • Polarization calibration for VLASS
  • autoboxing will be there in some form, but not clear what form that is. If performance is an issue for large cubes, autoboxing will be turned off as mitigation (Tak has put in a (hopeful) fix)
  • Tangent on talking about CARTA and its current problems dealing with large cubes

HPC

[Unfortunately, the ESO audio connection is not great, so Sandra's normally clear voice sounds like she was speaking from under water smile )
  • HPC runs pipeline different way than Pipeline group does
  • Small images case, overhead dominates tclean performance
  • In one trial, tclean 14 times faster in parallel
  • Urvashi: different cases can probably be understood by differences in image sizes and shapes
  • More data sets are needed. Asking Remy and Crystal for those.
  • Conversation between Lindsey and Sandra on pipeline ticket
  • Ville: Darrell repurposing servers for HPC.
  • Urvashi will follow up looking at web logs to see if things make sense.
  • Pre-release tarballs were used for tests.

Validation Testing

  • Pipeline swamping most valuable testers
  • Federico: Need to find who is responsible for service discussed in CAS-9089. Jen: waiting on an ALMA decision on this, so on backburner for CASA
  • statwt: Brian and Steve have been doing some testing. Need info from stakeholders on how much deep dive testing needs to be done. Around 20 tickets are in validation (and most have been for weeks). Steve is having an issue that perhaps combine="field,scan" may solve.
  • About 10 single dish tasks pushed through for plone docs
  • Much of plone docs will be pushed to 5.2 because of resource issues

5.1 Release Prep

  • Feature freeze Aug 1.
  • autoboxing to be in in some form
  • Kumar fixed 2 of 3 things on his list, working on mosaicing slowdown now.
  • Blocker tclean bug, turned out to be pipeline bug
  • New bug tickets from HPC gr
  • statwt under testing, needs testers (and has needed them for months)
  • dev targets not yet begun for 5.1
    • multi-ephermis tables, going up in priority. Kumar to look
    • n-sigma threshold as stopping criteria in tclean (Urvahsi looking at it)
    • Bob working on alma filler. Hard limit adding tables to ASDM. short term workaround remove some tables that aren't being used
    • Pam. plotms dealing with bug fixes. Maybe need some testers for that. plotcal moves to 5.2.
    • Enrique. mstransform support for statwt2.
    • Martin: visstat2, all done.
    • Release notes: conversion from XML to Plone is needed
    • need release for VLASS by end of August. Nominal release data is August 15 (some expletives heard).

5.2 Planning

  • initial input from Juergen. Mostly calibration related, a couple of imaging issues also.
  • Uvrashi pushing performance and memory use limits
  • Resource estimator for imaging
  • Discussion on what if anything will be done for VI/VB2 development.
  • Feb 1 for feature freeze.
  • Urvashi requests developers email her with what they think they could work on for 5.2 and include estimate of how much time things for which they are responsible which are currently being validated might take for them to address in 5.2.

-- MorganGriffith - 2017-07-26

This topic: Software/CASA > Software > WebHome > CASAMeetingPage > Jul2717
Topic revision: 2017-07-27, JaredCrossley
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback