Thursday Morning Meeting: 03 August 2017

  • DIAL-IN NUMBERS & PASSCODES:
  • IP: 192.33.117.12##8110
  • Phone: (434) 817-6524

Attendance

  • Socorro:
  • CV:
  • Garching:
  • SCO:

News / Meetings / Visitors

Build, Release

  • 5.1 feature freeze / release branch / packaging schedule.

Verification Testing

  • ALMA M100 with tclean in parallel mode is failing
    • Failing for about 2 weeks - can Federico take a look at it?
    • tolerance issue, but only on RH7
  • Working with Bjorn on HPC pipeline testing
    • Open question: what parameters in tclean do you think will benefit most from parallel processing?
  • CASA guide work
  • Documentation on adding/removing tests from the framework
  • Test runtime graphs are being run manually, as needed

Validation Testing

  • Current validation tally
    • 157 Under Validation, 77 Ready to Validate.
    • 11 tickets went RtV this week so far. 13 tickets went through Validation to Resolved this week so far.
    • July 21-27: 34 tickets assigned for validation. 37 tickets resolved.
    • if you have other essential 5.1 tickets for inclusion that are under validation or coming soon to validation please make sure I know
  • Current Testing Efforts for 5.1 deliverables
    • ALMA Cy5 pipeline, crossover between CASA and PL
      • (many tickets, many people. ongoing.)
      • autoboxing (CAS-10415, CAS-10457) testing is going in earnest. update from Tak? this needs to work before we release.
      • update on mosaic slowdown fix? CAS-10317. also a blocker to the 5.1 release.
    • statwt
      • 26 tickets under validation, 1 ready to validate. BrianM and Steve have done some testing, Claire will do some more.
    • parallelization
      • no update from my end
    • miscellaneous
      • please see https://safe.nrao.edu/wiki/bin/view/Software/CASAUserTesting for the up-to-date list of tickets and testing status
      • Anish Roshi is going to be looking at the flagdata averaging ticket (CAS-6215), at long last
      • still no word on CAS-7861 (CASA dies on VLA ephemeris datasets) from Bryan Butler
      • CAS-10028 casa ASDM filler fails when an ephemeris source has a slash in its name -- testing returns issue with "NameError: global name 'msmd' is not defined", which I think is a different issue
      • CAS-10438 viewer crashes casa if gui is closed -- needs to be assigned
      • CAS-10329 rename plot ms tool from 'plotms' to 'pmtool' -- needs to be assigned
      • CAS-10447 ia.fromshape(csys=...) fails if direction coordinate only has one axis -- needs to be assigned
    • plone (lower priority given short development cycle)
      • expecting a few more pages to be included and tickets resolved for 5.1, but not many more

Architecture

HPC

  • Parallel pipeline tests, current status: https://open-confluence.nrao.edu/display/CASA/ALMA+Pipeline+Cycle+5+Testing
  • 6 datasets with serial + parallel run results ready for Sci validation, see main table of the status page. Weblogs transferred here as they become available: https://safe.nrao.edu/alma/PipelineTestResults/ParallelImaging/
  • Keeping track of issues and bugs in this section (Related JIRA tickets)
  • Using zuul01-03 machines, now with 64GB
  • CASA prerelease/bugfix changing forward all the time
  • So far:
    • 7 of 15 test datasets have problems when run in parallel.
    • 4 bugs in pipeline or CASA are blocking test runs - none of them seem "core" issues although might need time to narrow down. One just fixed in pipeline (CAS-10451).
    • 7 CASA bug ticket currently open for failures and performance issues.
  • Performance:
    • For now we have 2 good showcases where parallel tclean delivers a speedup factor nearly linear with number of cores:
      • T004 ~14 times faster with 15 "MPI servers" - pipeline total time from 4d19h down to 19h.
      • T001 waiting for serial run to finish, but we already know from ~5d down to 19h.
      • For other datasets speedup is more modest (8x for T02), or much more modest.
      • Additional pipeline code that is not using CASA tasks and is not parallelized is consuming significant time (easily >=30% of total pipeline time). Example: for T004 in parallel 19h of which 8h37m are not CASA tasks (pipeline stuff including weblogs).
  • BTW. Pipeline regression failing: https://open-bamboo.nrao.edu/artifact/CASA-MAT/ATEL6/build-196/Html-Logs/el6/5.1.0-46/log.html#s1-s24-t1. Should we update the reference flux values?

Development

  • Autoboxing : Continuing minor changes as 'bug fixes'. Refused one last minute algorithmic change request.
  • Kumar ? CAS-10373 (FIELD table efficiency), CAS-10280 (selectData efficiency), CAS-10317 (mosaic slowdown), multi-ephemeris support
  • UR : N-sigma thresholds : moved to 5.2. VLASS pipeline has a backup plan.
  • Bugs from HPC tests (for CASA) :
    • CAS-10451/10459 : Bug traced to imageconcat when supplied with strings with extra quotation marks. Resolution : Lindsey changed the pipeline to not send in extra double quotes + contacted Ger-van-Diepen who agreed to fix it from the ImageConcat side to make it robust to such things. Enrique iterating with Ger on what all json syntax may be relevant in this fix.
    • CAS-10434 : Parallel cube on multiple MMSs showing problems at data selection stage (on the nodes). TT (and UR) is (are) investigating.
    • CAS-10453 : Failure of applycal step in parallel : Lindsey is waiting for the serial run to run to completion before a proper comparison can be done.
    • Bjorn/Andy started to look at serial vs parallel outputs. They see differences in parallel cube runs and are investigating to see if the differences are within the expected range or if there is a clear problem.

  • Anything else ?
    • Calibration : multi-band delay : not going in for 5.1, but will go in soon after. Not needed critically.
    • Statwt : Everything in validation stage. Plan is to merge to master before Aug 15.

  • 5.2 development planning : I am talking with each of the other developers. Those of you whom I have not yet spoken with, please think about what you already have on your plate for 5.2. I'm also waiting for some more requirements from the stakeholders. I hope to put something together next week and go over it here in next Thursday's meeting before presenting it back to the stakeholders.

Pipeline

  • Cycle 5 delivery
    • last pipeline working group / developer meeting before Cycle 5 software delivery will be August 18
    • pipeline will be branched August 21
    • on going testing, parameter tweaking, bug fixing, weblog improvements
    • end to end testing begins in ernest at JAO next week, preliminary results for pipeline look OK,
    • considerable uncertainty about impact of observatory scripts
  • ongoing HPC testing (see HPC section)
* started planning discussions for Cycle pipeline infrastructure development targets with pipeline developers, Cycle 6 science targets need prioritization * started planning discussions for Cycle pipeline infrastructure development targets with pipeline developers, Cycle 6 science targets need prioritization

AOB

Developer Reports

Thursday Meeting
  • Sanjay Bhatnagar
  • Sandra Castro
  • Lindsey Davis
    • improved ALMA interferometry pipeline low SNR spw mapping heuristics for cases where some window have hig SNR and some low SNR
* added diagnostic phase offsets table , plots, to the ALMA interferometry time gaincal stage results and web log page * added diagnostic phase offsets table , plots, to the ALMA interferometry time gaincal stage results and web log page * fixed miscinfo problem (dealing with names in double quotes) in the pipeline tclean task
    • CASA lead interviews
    • many meetings, email discussions, ...
  • Bjorn Emonts
  • Pam Ford
    • CAS-9053: development complete on atmospheric overlays; working on GUI bug
  • Enrique Garcia
    • CAS-9945: improve the approximations for effective bandwidth and effective resolution in split/mstransform
    • Created container for memory resource profiling in CASA.
  • Bob Garwood
    • asdmSummary tickets all now closed: CAS-9448, CAS-9883, CAS-10367 * CAS-7861 : CASA dies on VLA ephemeris data. The fix for the recently discovered bug is in the master branch. The fix for the original issue in this ticket has never been validated, but those changes predate bamboo and have been in past releases. This ticket still needs to be validated but the lack of verification will not delay 5.1.
    • CAS-10028 : casa ASDM filler fails when an ephemeris source has a slash in it's name. Under Validation.
    • CAS-10278 : Synchronize share ALMA/CASA code and eliminate compiler warnings. Under Verification.
    • Continued discussions on problems adding to the ASDM (and IDL/CORBA compiler issue).
  • Kumar Golap
  • David Mehringer
  • George Moellenbrock
    • Back from 2 week vacation; caught up on email, etc.
    • Pushed CAS-8553 and CAS-8540 (both VLBI-related) through to pull requests for 5.1
    • Decided to punt CAS-10138 (multi-band cross-hand delay) to 5.2
  • Dirk Petry
  • Martin Pokorny
    • CAS-9332, visstat renaming, almost done -- needs documentation issue
    • Work on MSIter changes in casacore
    • MSv3 meeting: cal tables discussion and organizing face-to-face meeting in September
  • Federico M Pouzols
  • Urvashi Rao
    • Lots of discussions and emails about autoboxing priorities, basic validation of HPC test results, testing (of statwt2), integration of SD gridder into the refactored imager, CARTA+Imager, freeze/release schedule for 5.1, a vi/vb2 feature
    • Debugging for CAS-10451/10459 : found problem with ImageConcat. Followed up with both Lindsey (to avoid the problem via the pipeline) and Ger-van-Diepen (to fix the problem in casacore::ImageConcat)
    • Talked with some of the other developers about 5.2 targets (more to go). Tried to find some more details about stakeholder requests.
  • Darrell Schiebel
  • Ville Suoranta
  • Takahiro Tsutsumi
    • worked on automask speed up fix (CAS-10415) and related additional feature request (CAS-10457)
    • Also CAS-10462 (automask to handle absorption features)
    • Still waiting for verification for 5.1: CAS-9186
    • post-5.1: started to look into CAS-9538 (Perley-Butler 2017 for setjy).. before pulled back to automask
    • Briefly looked at parallel cube tclean issue, CAS-10434
Friday NAOJ Meeting
  • Kanako Sugimoto
  • Wataru Kawasaki
  • Masaya Kuniyoshi
  • Takeshi Nakazato
  • Renaud Miel

Notes (by Enrique)

Attendees: Kumar, Dave, Tak, Juergen, Pam, Lindsey, Urvashi, George, Ville, Darrell, Akeem, Anand, Andy, Bjorn, Jen, Enrique, Federico, Martin, Bob

News

  • The CASA software subsystem scientist is in Socorro organizing local meetings with people there.
  • The committee has made a decision for the new CASA lead.
  • Juergen informs that in the CUC (CASA Users Committee) meeting to be held likely September-November, 5 or 6 articles will be presented covering topics like VI/VB2 and HPC.

Build and release

  • We are now in feature freeze. The plan is to release the packages on 21st. A few people will be away but that shouldn't be a problem, since at least Vile will take care of it.
  • After packaging next step is to do some manual testing of the packages in some of the supported platforms. Juergen will try in RH6 and OS X. Jen can try to upgrade her laptop to OS X 10.12 to do some further testing. Some coordination will be needed in order to not have overlapping efforts.
  • ALMA has a hard deadline of 1st September, but it should be enough to have a release candidate for them.
  • There has been a significant peak in Bamboo activity at the start of the week. The question is raised whether there will be some constraints to be taken into account close to the freeze dates. Vile mentions that probably issuing more than 6 pull requests in the same day can start to be a problem.

Verification and testing

  • ALMA100: Federico will have a look to see of it is a tollerance problem or something deeper in tclean. It fails only in RH7.
  • HPC testing:
    • Answering a question from Andy, Urvashi mentions that the most relevant parameter in tclean for HCP is the number of channels, which will determine the size of the cube.
    • Lindsey reminds that the pipeline logs the exact command for tclean, in case it is useful for the testing.
  • 37 tickets have been resolved last week. In case any developer thinks that a ticket has no validation activity, please tell Jen.
  • Autoboxing is being tested by Crystal. Tak will make an extra change to refine the default parameters.
  • Mosaic slowdown is being investigated by Kumar.
  • Claire is validating StatWt. In total there are 26 tickets to validate. It has been decided that even if validation goes wrong for some of them, the full development will be merged as a single unit.
  • Regarding documentation Juergen asks whether there is anything specific for 5.1. Jen answers that not really.

HPC

  • Parallel pipeline testing:
    • A number of issues have been found but most of them seem to be only shallow issues rather than deep issues in CASA. Those ones in CASA are assigned the highest priority.
    • By the end of August 20 datasets would have been tested, with a relatively good coverage of instrumental configurations.
    • It could be possible to get Cycle 5 data before the release.
    • Darrell will setup 4 more systems to help the testing efforts.

Development

  • Kumar is working on how to improve the field table performance. This is relevant for VLASS.
  • Multi-ephemeris support will likely be dropped for the time being. In case it is added, a new parameter should be added to tclean to support it.
  • Development for 5.2: Urvashi is asking all developers for their 5.2 release targets.
    • She will go back to the stakeholders with the collected feedback.
    • If you have not been contacted by Urvashi yet, think about the relevant 5.2 targets.

Pipeline

  • Most of the work now is on tweaking parameters, rather than major developments.
  • It is not clear which is the impact of the observatory scripts.
  • For cycle 6 there are many things that have been requested. It is important that a good prioritization takes place.
    • The next major thing is the Cal library, where good collaboration between CASA team and stakeholders is needed.

AOB

  • Deprecated task:
    • Work in progress to place deprecation warnings in a number of tasks.
    • cvel: It is not clear which is the plan for this. According to Federico some people think that cvel1 should not be dropped. Juergen argues that cvel1 has an issue with regridding and cvel2 does everything cvel1 ever did. Federico replies that there a few tickets from users regarding differences but they are either non-reproducible or the user went silent. Additionally, cvel2 cannot cope with combination of spw's. Urvashi recommends to wait.
    • oldsplit, oldhanningsmooth: Juergen thinks they should be removed. The new versions have been around a long time. Sandra should confirm this.
    • tclean2: it was an experiment and should be removed.
    • tclean vs clean: it is agreed that both should be kept for a while. It must be taken into account that all the regression tests use still clean.
    • Pam suggests that a clear path to deprecating tasks in CASA should be standardized.
  • CARTA: Apparently a release was expected in July but it is unlikely that a release will happen soon.

-- PamFord - 2017-08-02-- PamFord - 2017-08-02
Topic revision: r14 - 2017-08-03, EnriqueGarcia
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback