First smoke tests on 10.13 Mac. Working on Bamboo integration for a build and test machine.
PR Build seems to be doing what it is supposed to. Will run more tests for a week or two before turning off TS3 PR tests.
Received and set up a new Mac Mini
Has OSX 10.13 on it (so we have 2 10.13 machines)
CASA has been built on 10.13, but is not being routinely built, yet
New Mac Pro
Order request has been put in to the system, presumably to be approved by Morgan soon
Helpdesk has put together a Pegasus 12TB (usable) RAID6 array, with Thunderbolt 2 connection, should be ready for attachment when the Mac Pro arrives (includes about 4 spare hard drives)
Pipeline HPC vs serial testing
There is a slight difference in spectral channels being selected by either parallel or serial
This is significantly changing not just the spectra, but also the cleaning results in the imaging (while results are identical at step 20, they start diverging after that as late as step 26)
There are very slight differences in Reference Position in the images (and looks suspiciously like a difference in truncation of reals - take a look at the ticket for details)
Remy's take (on the JIRA ticket): "The duplication of processing in the parallel run suggests that you have a threading or flow problem, and end up with two processes doing the same thing and probably colliding, resulting in the very alarming final reported tiny medians and MADS. If true that's a pretty serious bug that you need to understand."
Dev Question : According to Andy, at step 20 (hif_makeimages) results are identical, but starting at step 26 (hif_findcont) things look significantly different. Andy - does this mean that the input to hif_findcont is identical in serial/parallel ? If it's happening in findCont, we could ask Todd to take a look too to speed up the debugging (as he's the developer for this code). Also, has anyone looked at what tools Todd's findCont is using, in case there are any differences between operating on a regular image vs a refconcat image ?
Current validation tally
122 Under Validation, 81 Ready to Validate.
5 tickets went RtV this week so far. 4 tickets went through Validation to Resolved this week so far.
Oct 20 - 27: 8 tickets assigned for validation. 7 tickets resolved.
Current Testing Efforts for 5.2 deliverables
Current Testing Efforts for 5.3 deliverables
imaging issues/autoboxing refinements
systematic tests of mstransform
further parallelization testing (non-ALMA pipeline specific)
MPI buffer error issue in the pipeline seems fixed: CAS-10662
~15 test results for adtitional datasets from cycles 4 and 3 available. Weblogs not transferred, but ready if they're helpful for validation.
Sandra met the D A R E D developer at ADASS and brought to his attention the way to run CASA and the pipeline in parallel. He had not clue that either CASA or the pipeline could be executed in parallel. During the conference he worked on a prototype to call mpicasa from his scripts, but I don't know if it will be made to a release. Communication is clearly at the lowest level in ALMA.
Chatting with Jeff about D A R E D, he confidently said that D A R E D will not be used by NA ARC. Who has the final decision on this????
CAS-10853 : Selection error ( Update from TT ? )
CAS-10849 : Restoring Beam in Parallel Cube : Solution being worked on to enable access of refconcated images from a non-parallel tclean run. Will require pipeline to call a separate 'restore-only' tclean with parallel=F at the end of the niter>0 imaging run (done with restoration=False).
CAS-10794 / CAS-10418 : Refconcat image analysis inefficiency ( Update from KG ? )
CAS-10794 / CAS-10837 : MS open on multi-MS slower ( from KG : yes, but it is as expected when opening/selecting on all subMSs. Also, 2sec to 9sec is still not a show-stopper compared to all other required operations as it's done once per call )
Qn for Juergen : Any changes in what stakeholders want (as per yesterday's meeting) ?
What to do with VIVB2 feature/bug tickets? Created CAS-10864 for writing FLOAT_DATA to disk and it was automatically assigned to Ryan. ( Answer for now : Make all new vivb2 issues subtasks of CAS-10292 (vi2 enhancements after initial deployment) ).
CASA 5.1. / Pipeline
VLA patch may be required, tec_maps / recipes Python issues, plotweather problems
Various issues with ALMA D.A.R.E.D. scripts affecting operations and apparently results
CASA 5.2 / Pipeline
Waiting on CASA input before implementiong common restoring bean handling strategy
Waiting on CASA input before implemented virtual concatenated imaging handling strategy
CASA 5.3 / Pipeline
AQUA report for VLASS and / or VLA, discussions with VLASS
Sessions branch testing
Display code refactoring
CUC presentation coordination
Summarized pipeline view of ALMA Pipeline / Archive / AQUA interactions and set to management
Discussion of ALMA D.A.R.E.D. script issues with NAASC
Discussions of tclean issues with CASA team
Completed initial display code refactoring
Tested pipeline sessions branch, works but there are cli build issues
Fixed typos in the pipeline task documentation
Responded to validation testing questions/reports for CAS-8270 (plotcal in plotms), CAS-10598 (new plotants), CAS-10848 (capitalized items in ms.getdata), and CAS-10822 (multiple plotms windows when dbus connection fails)
CAS-10644 - the filler needs to detect duplicates in Pointing and DATA (for WVR). Started work on the Pointing case.
CAS-10665 - investigate speed of bdflags2MS. I have identified an interesting case where the flag step is significantly slower than similar ASDMs. This appears to be a resource limit issue (probably memory) for some scans. I'll come back to this ticket and the related asdm2MS slow to fill ticket after I work through a few more concrete tickets (the duplicates, and adding in new tables, and dealing with the new channel based flags).
I was out for 2.5 days due to illness.
At QUESO meeting last week; gave invited talk on polarization calibration
Following up on some polarization cal questions raised by ALMA polarization folks.
Federico M Pouzols
Pipeline tests for further datasets for which serial runs are now available in the "cycle 6" spreadsheet from PLWG.
Parsing and digesting stats and plots for tclean by size, specmode, gridder, etc. for 5.2 tests.
3 days off
automasking - continued to work on code speed-up now looking at grow mask step. In particular, I am modifying a number of grow iterations to be performed to be dynamically controlled (CAS-10745)
automasking discussion: heard a nice presentation by A. Kepley
Started looking into the parallel tclean selectData issue (CAS-10853)