VLASS readiness assessment yesterday, CASA and pipeline on track
Build, Release
Pull requests:
Pull requests should be created only after verification/validation testing is complete (i.e. the ticket is Resolved).
A pull request automatically launches Branch Test Suite 3. Before making a pull request, please check the commits to see when master was last merged into the branch. If the branch is very out of date, Branch Test Suite 3 will probably fail.
In general, Bitbucket branches will be deleted after a pull request is merged, unless the developer asks to keep it. The branch will remain in the local repo until deleted (git branch -d branch).
Planning for 5.2/5.3/5.4 or 5.1.1/5.2/5.3 : Discussions are still ongoing as to whether we'll need anything non-standard or not on the Dec 2017 timescale. If needed, we'll call it either 5.1.1 or 5.2 depending on what goes into it. Either way, it will be managed as a separate branch so that all other development (unrelated to this) can proceed normally on master.
Release Logistics and Semantics ( Notes from Darrell )
patch -- a patch is a change which fixes bugs which do not have a systematic effect. A CASA stakeholder who updates from CASA 5.1 to CASA 5.1.1 should be able to assume that the results they obtain will not differ significantly and the user experience will have no significant changes. The patch only fixes bugs which are "spot fixes" in the view of those who understand the relevant code within CASA. These fixes are expected to not affect all users of CASA. Users can assume that updating to a newer patch version does not require revalidation.
release -- a release contains new features or bug fixes which significantly affect the behavior of CASA in some way for some class of CASA users. When users update to a new release, they should assume that they must validate their use of CASA.
As CASA becomes a production system, it is important that this distinction be clear. The "patch" fix for ALMA cannot be allowed to introduce a "bug" for VLASS and vice versa. A "patch" puts the onus of consistency validation upon the CASA group whereas a "release" puts the onus of consistency validation upon the users of CASA. However, this distinction is not about shifting blame, but rather it is about a consistent understanding of the implications of a particular distribution of CASA.
Using these definitions, the imaging changes which seem likely to be required for parallel execution of the pipeline lie outside of what can be considered a "patch". To accommodate parallel development, we will work on CASA 5.2 and 5.3 concurrently:
The CASA 5.2 branch will contain the December release candidate. This branch will created soon after CASA 5.1 is released. Developers that are working on features for the 5.2 HPC release (in December) will merge the changes to both the master branch and the CASA 5.2 branch.
The master branch will include the CASA 5.2 changes but will also include any changes for CASA 5.3 which will be released in the spring/summer.
The CASA 5.1 branch can be used for any "spot fixes", i.e. patch changes, which could be provided as CASA 5.1.1. However, we do not anticipate any patches.
The situation is still somewhat fluid, so it could still be that we decide not to have an extra release of CASA in December.
Verification Testing
Farewell Akeem!
Thanks for all his hard work, and dedication to the test these last 3 years
His last day is Friday
CAS-10481 : rms inconsistencies between serial/parallel runs of the ALMA pipeline. Need to re-run and re-evaluate after all the recent data-selection / csys creation bug fixes. Bjorn/Andy - any further insights ?
Should be rerun using the tarball created by CAS-10434
Working with Bjorn on scripting benchmarking on serial vs. parallel in CASA
Discussion on where to put the data, what machine can be used to run the scripts, and a plan to make that happen
Test team is MIA next week - Andy in Chile (ALMA IRM meetings and training), Akeem is gone, and Puimek is on vacation
Validation Testing
Current validation tally
123 Under Validation, 77 Ready to Validate.
17 tickets went RtV this week so far. 42 tickets went through Validation to Resolved this week so far.
Aug 4-10: 24 tickets assigned for validation. 30 tickets resolved.
Jen is unavailable 8/18-8/25 (organizing a meeting, internet access likely to be unreliable). Go outside and (safely) watch the eclipse!
Current Testing Efforts for 5.1 deliverables
autoboxing. I believe everything has been merged already.
mosaic issues. I believe everything has been merged already; the slowdown is real, due to fixing a 4.7.2 bug in channel chunking (CAS-10317).
statwt2
5.2 testing ticket is CAS-10530
parallelization
HPC team has switched over to running serial tests to check serial results given several new parallel fixes
early results (Federico on CAS-10538) indicate serial run completes without errors; how do the images look?
Note/warning: results in the weblogs and tables were generated with a wide range of prereleases (5.1.0-34...5.1.0-58 and a number of specific bugfix branches, all last month of development, with many fixes and fix-fixes in between, At the moment, trying to get some fresh reruns of selected datasets for validation / Bjorn and Andy ( CAS-10481)
cvpost065-068 machines being upgraded (infiniband net for all + 64->256 GB for 2 machines)
After many fixes went into master, for parallel mode only this test-blocking issue is currently open: CAS-10538. Not sure how often it happens, and could not test much because most if not all parallel runs have also been blocked by CAS-10536.
I looks like current master is good for serial mode. But we have sparse runs/evidence.
For test serial runs next weeks, should we try to parallelize the VLASS way, 2 or more serial runs on the same machine? For those serial runs where find_cont doesn't use ~150 GB it should be possible to run at least 2 serial test in parallel on the cvpost065-68 machines. Any experiences with this?
Development
Autoboxing : Nearly all edits have merged to master. Tak - status on the last one ?
Imager_BugFixes_5.1 : 25 issues in all. 4 are left. (Included in this list are HPC specific bugs/fixes : 2 out of 8 are left ). Need not hold the 5.1 branch for any of these.
CAS-10451 : json parser fix : The fix fixes the problem (validation), but some hiccups in verifying that the incoming changes from casacore didn't break something else. Martin ?
CAS-10538 : KG is looking at a fix for the parallel run failure. Federico has run it in serial to make sure it is OK for 5.1 even if a fix does not go in. If KG fixes it in the next week or so, we'll decide then whether to include it in 5.1 or have it only on master.
CAS-10317 : Mosaic slowdown. VLA case : being worked on. ALMA case : all is as expected : the 5.1 run took longer because it did more work (which 4.7.2 failed to do because of a bug).
CAS-10264 : Need to fix a log message. Very minor. Will put in on master as well as 5.1 in the next few days.
A few tickets (~3) moved to 5.2.
New bugs : CAS-10525 : tclean ignoring negative flux - fixed/merged. CAS-???? : residual image scaling isn't what it should be with mosaics (KG looking) - probably 5.2 issue.
Last minute autoboxing / image size mitigation related imaging tickets are under validation
Major issue with ALMA quasar catalog service / flux.csv file have been resolved
Several small web log improvement are in progress
On going VLA pipeline testing at DSOC
Small web log fixes
Some pipeline team members have other work commitments and limited availability for next few weeks
AOB
Many of these warnings have appeared recently:
In pipeline tasks such as pipeline.infrastructure.tablereader::ms, pipeline.hif.tasks.rawflagchans.rawflagchans::ms, pipeline.hif.tasks.correctedampflag.correctedampflag::ms
The use of ms::iterorigin() is deprecated and will be replaced by iterorigin2() in a future version. After deprecation, iterorigin2() will be renamed iterorigin().
The use of ms::getdata() is deprecated and will be replaced by getdata2() in a future version. After deprecation, getdata2() will be renamed getdata().
And also in CASA.
flagcmd::ms::range The use of ms::range() is deprecated and will be replaced by range2() in a future version. After deprecation, range2() will be renamed range().
task_setjy: ms::nrow
Will it be a backwards compatible replacement or do we need to worry about this?
ms tool warnings will be removed from 5.1: CAS-10597. New functions have same signature and results as old ones
pull requests merged for smallish plotms bug fixes: popups (CAS-10537), uv-related axis seg fault (CAS-10534)
created cal tables for plotcal->plotms testing, started on reference antenna selection issue (CAS-7049)
Enrique Garcia
CAS-5174 - UV Continuum Subtraction TVI
CAS-10211 - improve the approximations for effective bandwidth and effective resolution in split/mstransform
CAS-10013 - StatWt Rework: Add support to mstransform
Bob Garwood
CAS-10278 - Synchronize shared ALMA/CASA code and eliminate compiler warnings. Trying to help with suitable data to use to test impotasdm -> exportasdm to make sure nothing has changed.
Some tests related to the asdm move to a separate package.
Learning java.
Kumar Golap
Jeff Kern
David Mehringer
George Moellenbrock
Dirk Petry
Martin Pokorny
Completed casacore changes to MSIter for asynchronous VI2
Updated casacore submodule reference in CAS-10451.
Looking into fftshift interpolation issue CAS-10584.
Urvashi Rao
Fixed CAS-10525 : cleaning negative flux.
Testing for CAS-10451 : json fixes.
Ongoing discussions with various parties about whether a 5.1.1 patch or something else on a Dec 2017 timescale is required and how we will handle this.
Many other discussions and JIRA patroling.
Darrell Schiebel
Ville Suoranta
Takahiro Tsutsumi
Took a look at CAS-10481. Some of the results seem to be affected by the CAS-10434. Probably worth while to rerun with the latest casa version.
Verified that one of the fixes done by Kumar and Sanjay or CAS-10434, in imager_parallel_cube.py was correct (probably it was a bug introduced by me a while ago ).
CAS-10250: retried to reprodue the issue with current pre-release. Also noticed that in the original post, it was using restoringbeam=common where chan0 psf was a quite large than those for the rest of channels.
CAS-10462: Worked on to resolve the conflict with master. Now it has been merged.