How to run the ALMA pipeline in Charlottesville

Documentation

Environment scripts used in production

Before April 3rd 2017: --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2_qa2.sh 
After  April 3rd 2017: --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2_1_qa2.sh  

Prerequisites

  • For useful tips and information including how to request access to the cvpost Lustre nodes and how to VNC, visit the LustreAndCluster page.
  • put this in your ~/.casa/prelude.py file. it won't affect regular CASA use if you haven't also sourced one of the pipeline environment scripts below.
import os
import matplotlib
if os.environ.has_key("SCIPIPE_HEURISTICS"):
   print "Using Agg backend for pipeline"
   matplotlib.use('Agg')
  • put this in your ~/.casa/init.py file. it won't affect regular CASA use if you haven't also sourced one of the pipeline environment scripts below.
if os.environ.has_key("SCIPIPE_HEURISTICS"): # so this works with SD staging script
    if os.getenv("SCIPIPE_HEURISTICS").find('asdmExport') < 0:
        print "Adding pipeline heuristics to python path"
        sys.path.insert(0, os.path.expandvars("$SCIPIPE_HEURISTICS"))
        import pipeline
        import pipeline.infrastructure.executeppr as eppr
        pipeline.initcli()
    else:
        print "You must have sourced an environment with asdmExportLight..."

Standard Method: run /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/calibPipeIF-NA.py

  • this will run import, run the "fixes" e.g. fixplanets, and then run the rest of the pipeline. If for some reason you don't want the "fixes", see below
  • a new pipeline directory will be created in your pipeline/root/ directory in your lustre area corresponding to this rerun, regardless of where you run the above command
  • choose your pipeline version with the --env option (no quotes on the script name)
    • operations version in May 2016 a.k.a. the casa 4.5.3 version: --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C3R4_4.5.3.sh
    • current test version: --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R1.sh
  • if you want to run from scratch, specify --mous=[MOUS status ID - see how to get it below] (no quotes on the MOUS status id). e.g. from any directory:
     /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/calibPipeIF-NA.py --mous=uid://A001/X145/X17e --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2.sh 
  • if you have a directory containing a PPR and optionally a flagtemplate.txt file (e.g. a previous pipeline run or data delivery package), do not use --mous=, rather use --flag=/lustre/naasc/that/directory/containing/PPR (no quotes on the directory name), e.g.
     /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/calibPipeIF-NA.py --flag=/lustre/naasc/fbaggins/2012.1.00345.S/science_goal_42/files_for_pipeline_rerun/ --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2.sh 

Getting the MOUS status ID

  • If you already have the ASDM file on disk, you can extract its OUS using the analysisUtils command au.getOUS, which can be run in any version of casa (where the uid here is the ASDM name):
    • CASA <2>: au.getOUS('uid___A002_X5b06c4_X5c9')
    • This will print: Out[2]: 'uid://A002/X5a9a13/X528'
  • If you don't have the ASDM, you can use the Project Trackerto find the OUS using these instructions:
    • Use project search to find and select the right project.
    • In the lower left window that has column headings "Entity" and "Status", click on the SG of interest, one-level below the "ObsUnitSet" folder. Its icon is a light blue stack of square clocks. (It is one level above where the icon is a single yellow square clock.)
    • In the window in the right, at the top, the "Status entity ID" is the OUS ID.
    • (This is different from the "Sched Block id" and the "Status entity id" that you would see if you clicked on the level below, with the yellow icon.)



Nonstandard: Running without the wrapper script

  1. you need the member OUS status ID as above
  2. set the environment in a shell to point at the pipeline version that you want to use e.g.
    . /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2.sh
  3. run pipelineMakeRequest to download the raw data and set up a pipeline directory structure, and create a PPR, NOTE: MOUS name must be in format of uid://A001/X2fe/X862 e.g.
    pipelineMakeRequest MOUS_STATUS_ID intents_hifa.xml procedure_hifa_cal.xml true
  4. find the working directory that pipelineMakeRequest creates eg. /lustre/naasc/sciops/comm/USER/pipeline/root/2012.1.00912.S_2013_01_28T18_23_10.196/SOUS_uid___A002_X5a9a13_X526/GOUS_uid___A002_X5a9a13_X527/MOUS_uid___A002_X5a9a13_X528/working/
  5. modify the PPR in that dir if you want
  6. casa
  7. CASA> eppr.executeppr("PPR.....xml",importonly=False)

changing quasar fluxes

  • During conversion of the ASDM to MS fluxes in the source file are loaded into the pipeline context and saved to a file on disk. The fluxes [I, Q, U, V] are defined as a function of MS, field name, and spw. The name of the disk file is 'flux.csv'. It is a text file where each line contains comma delimited fields specifying the MS name, field name, spw, I, Q, U, and V. The first line in this file must not be changed but the flux values themselves may be edited or and new lines may be created. On subsequent runs of the pipeline (which start from the MS not the ASDM) the contents of this file will supersede the contents of the ASDM Source table.
  • follow the instructions below to create a pipeline directory, MS, and flux.csv file, then edit flux.csv and rerun eppr.executeppr() in the working directory

Nonstandard: Running from MS, e.g. after Flagging more data (or other manual intervention)

1 there are 3 different starting points:
  • you have an MS from a pipeline calibration run, in its pipeline working directory e.g. /lustre/naasc/sciops/comm/USER/pipeline/root/w012.1.00912.S_2013_01_28T18_23_10.196/SOUS_uid___A002_X5a9a13_X526/GOUS_uid___A002_X5a9a13_X527/MOUS_uid___A002_X5a9a13_X528/working/: goto step 3
  • you have an MS e.g. from manual calibration, but need a (new) pipeline directory structure to work in: goto step 2a
  • you need both an MS and a (new) pipeline directory structure to work in: goto step 2b
2a You need to create a pipeline directory structure with pipelineMakeRequest, but not get the raw data:
  • you need the member OUS status ID as above
  • source the desired pipeline branch setup file e.g.
    . /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R1.sh
  • pipelineMakeRequest MOUS_STATUS_ID intents_hifa.xml procedure_hifa_cal.xml false
  • ("false" at the end means it should not get the ASDMs from the archive)
  • put your MS in the newly created pipeline working directory
2b You need to create a pipeline directory structure with pipelineMakeRequest, and import the raw data:
  • you need the member OUS status ID as above
  • source the desired pipeline branch setup file e.g.
    . /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R1.sh
  • pipelineMakeRequest MOUS_STATUS_ID intents_hifa.xml procedure_hifa_cal.xml true
  • cd to the working directory that was just created
  • casa
  • CASA> eppr.executeppr("PPR.....xml",importonly=True)
  • exit and restart casapy to clear any pipeline context in memory.
3 cd to the the pipeline working directory; make any desired changes to MS, flux.csv, flagtemplate.txt, and PPR
  • For example, to update the flux densities by interpolating from the calibrator catalog:
  • au.getALMAFluxcsv('flux.csv')
4 Make sure more MS is relatively clean: remove the model, restore the flags, and remove any cal tables (but keep the .ms), by running
  • delmod('your_uid.ms') or clearcal('your_uid.ms')
  • flagmanager('your_uid.ms', mode='restore', versionname='Original')
  • rm -rf your_uid.ms.*
5 if you haven't already done so, source the desired pipeline branch setup file e.g.
. /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R1.sh

6 re-execute the pipeline and it will skip the import step, leaving any modifications to the MS in place (like fixplanets):
CASA> eppr.executeppr("PPR.....xml",importonly=False)

Nonstandard (and may no longer work): Run analyzemscal

  • To generate a QA2-like report for the pipeline reduction that appears at the standard webpage you have to run a second casapy in a shell where you have NOT sourced the pipeline setup script. This doesn't have to be a specific version of casapy.
  • You do, however, have to have this line in your init.py, or execute it manually once starting casapy:
    CASA <03>: sys.path.append('/home/casa/contrib/AIV/science/analysis_scripts/')
  • inside casapy,
    CASA <04>: import analyzemscal as amc CASA <05>: analyzemscal(asdm='uid___your_asdm', pipecaldir='/lustre/naasc/username/pipeline/root/your_project/....../working')
  • If there is a corresponding manual reduction of this dataset, then be sure that you have the original products, including the qa2 subdirectory. If the latter does not exist, then you need to run es.generateQA2Report('your_uid.ms') before running amc.analyzemscal(). You will need to specify the manualcaldir parameter to point to the manual reduction directory (you do not need to have write permission there).
  • You can see a full list of all analyzemscal parameters in the online help: help(amc.analyzemscal)

Cycle 4 Workflow - Running through calibration from calibPipeIF-NA.py, weblog review, flag identification

THESE INSTRUCTIONS WILL CHANGE AFTER JAO APPROVES NEW calibPipeIF.py THAT WORKS AT ALL THE ARCS. THE NEW SCRIPT WILL UPDATE EPT AND EXTRACT/STORE flagtemplate.txt FILES PROPERLY - THE FOLLOWING WILL NOT!

NOTE: calibPipeIF-NA.py must be run from a bash shell

This can be used if the MOUS has ASDMs that have all passed QA0. If they haven't (e.g. project codes with .CSV extensions), or if you don't want to apply all the "fixes", then you need to use the PPR method below.

The script can creates a PPR or use an existing one if path provided, extract ASDMs from the archive, run importdata, run the SACM "fixes" (modifying casapipescript.py if necessary), and then run the rest of the PPR. Whether the PPR contains only calibration or both calibration and imaging and whether the pipeline stops at the breakpoint is controlled by "switches" described below.

The calibPipeIF-NA.py script takes the following "switches":
  • --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_[PL Version].sh sets the proper environmental variables for the pipeline version (REQUIRED)
  • --mous=[MOUS status ID] Specify the MOUS that will be run (see above for how to find this) (REQUIRED if no PPR given by --flag switch)
  • --flag=[Full Directory Path] optional switch specifying the directory that contains the [ASDMuid]flagtemplate.txt file(s) for this MOUS and/or a pre-existing PPR (e.g. from a previous run) (Existing PPR REQUIRED if no MOUS given by --mous switch)
  • --image optional switch to add the pipeline imaging commands to the PPR
  • --break optional switch to allow the PPR to be run only through calibration

Current fixes (Oct 2016):
  • au.getALMAFluxcsv(): creates flux.csv file with most recent calibrator fluxes from Source Catalog database (ASDM stores fluxes extracted SB runtime) - still needed for Cy4
  • es.correctMyAntennaPositions: creates antennapos.csv file with most recent antenna positions (ASDM stores values at SB runtime) - still needed for Cy4
  • fixplanets: recipe for solar system objects, for Cy2 or earlier data
  • fixsyscaltimes: for (some? all?) Cy3 data
  • es.fixforCSV2555: for Cy2 data taken during a certain period of time
The last three fixes should not run and/or "do no harm" if not needed.

Step 1: Create pipeline directory tree, PPR, and run pipeline, stopping after calibration
  • From anywhere, run the following from a bash shell:
 /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/calibPipeIF-NA.py  --mous=[MOUSid]  --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2.sh  --image --break

Step 2: After the calibration run is complete, review calibration weblog
  • Identifying any data to flag, and add them to the appropriate flagtemplate.txt file.
  • NOTE: To save time, if you see an extreme outlier in an hif_applycal plot, it can be helpful to manually flag the offending data in the .ms and then remake the pipeline plot manually (the plotms command is given when you click on the plot in the weblog). This will allow you to see if there are any remaining outliers in the plot that you couldn't see before, because the scale was dominated by an extreme outlier. After verifying in this way you still need to add the flag command to the flagtemplate.txt and rerun the pipeline -- this step is just to hopefully save you from having to run it a third time!
  • IF NO FLAGS ARE NEEDED: go to Check for Imaging intervention and do Imaging

Step 3: IF FLAGS ARE NEEDED:
  • From anywhere, run the following from a bash shell:
 /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/calibPipeIF-NA.py  --env=/lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2.sh  --flag=[Full/Directory/Path that has PPR and/or flagtemplate.txt file(s)] --image --break
  • Note that this command requires that existing PPR and flagtemplate.txt files are in the same directory
  • Examine calibration weblog again:

NOTE: calibPipeIF-NA.py has a step that copies the calibration results to /calproducts, this directory can be ignored for Cycle 4 and may go away in the future.

Alternative Cycle 4 Workflow - Running through calibration directly from PPR

1. Source the correct version of the pipeline (from bash):

 source /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2_1.sh

2A. IF QA0 STATE HAS BEEN SET: (normal observations) Create the directory structure, automatically importing the ASDMs, NOTE: MOUS name must be in format of uid://A001/X2fe/X862:

 pipelineMakeRequest MOUS_STATUS_ID intents_hifa.xml procedure_hifa_cal.xml true 
  • cd into the /working directory created by pipelineMakeRequest (/lustre/naasc/sciops/comm/username/pipeline/root/PROJECT_ID_TIMESTAMP/SOUS_uid___A00*/GOUS_uid___A00*/MOUS_uid___A00*/working/)

2B. IF QA0 STATE HAS NOT BEEN SET: (CSV observations) Create the directory structure w/o automatically importing the ASDMs:

 pipelineMakeRequest MOUS_STATUS_ID intents_hifa.xml procedure_hifa_cal.xml false 

  • cd into /rawdata directory created by pipelineMakeRequest
  • Run asdmExportLight on all ASDMs (repeat for each asdm_uid) [done outside of CASA]
 asdmExportLight 'asdm_uid_hexString' 

  • cd into the /working directory created by pipelineMakeRequest (/lustre/naasc/sciops/comm/username/pipeline/root/PROJECT_ID_TIMESTAMP/SOUS_uid___A00*/GOUS_uid___A00*/MOUS_uid___A00*/working/) and start CASA
  • execute the following to add the AsdmIdentifier blocks to your PPR:
 import analyzemscal as amc 
amc.modifyPPR('PPR_your_name.xml', newpprname=False, asdm='asdm_uid1, asdm_uid2')

3. Enter CASA and execute the PPR with importonly=True to create Measurement Sets from the ASDMs.

 CASA> eppr.executeppr("PPR.....xml",importonly=True) 

4. Execute the following fixes (these are the only ones needed for Cy4 data; data from earlier cycles may need more - use calibPipeIF-NA.py method instead)
import glob 
es = au.stuffForScienceDataReduction() 
mslist = glob.glob('uid___A00*_X*_X*.ms') 
es.correctMyAntennaPositions(mslist) 
au.getALMAFluxcsv('flux.csv')

5. Run the pipeline with a break after calibration:

 CASA> eppr.executeppr("PPR.....xml",importonly=False, bpaction='break')

6. Review calibration weblog, identifying any data to flag, and add them to the appropriate flagtemplate.txt file.

Cycle 4 Workflow - Running through calibration and imaging directly from PPR

1. Source the correct version of the pipeline (from bash):

 source /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2_1.sh

2A. IF QA0 STATE HAS BEEN SET: (normal observations) Create the directory structure, automatically importing the ASDMs, NOTE: MOUS name must be in format of uid://A001/X2fe/X862:

 pipelineMakeRequest MOUS_STATUS_ID intents_hifa.xml procedure_hifa_calimage.xml true 
  • cd into the /working directory created by pipelineMakeRequest (/lustre/naasc/sciops/comm/username/pipeline/root/PROJECT_ID_TIMESTAMP/SOUS_uid___A00*/GOUS_uid___A00*/MOUS_uid___A00*/working/)

2B. IF QA0 STATE HAS NOT BEEN SET: (CSV observations) Create the directory structure w/o automatically importing the ASDMs:

 pipelineMakeRequest MOUS_STATUS_ID intents_hifa.xml procedure_hifa_calimage.xml false 

  • cd into /rawdata directory created by pipelineMakeRequest
  • Run asdmExportLight on all ASDMs (repeat for each asdm_uid) [done outside of CASA]
 asdmExportLight 'asdm_uid_hexString' 

  • cd into the /working directory created by pipelineMakeRequest (/lustre/naasc/sciops/comm/username/pipeline/root/PROJECT_ID_TIMESTAMP/SOUS_uid___A00*/GOUS_uid___A00*/MOUS_uid___A00*/working/) and start CASA
  • execute the following to add the AsdmIdentifier blocks to your PPR:
 import analyzemscal as amc 
amc.modifyPPR('PPR_your_name.xml', newpprname=False, asdm='asdm_uid1, asdm_uid2')

3. Enter CASA and execute the PPR with importonly=True to create Measurement Sets from the ASDMs.

 CASA> eppr.executeppr("PPR.....xml",importonly=True) 

4. Execute the following fixes (these are the only ones needed for Cy4 data; data from earlier cycles may need more - use calibPipeIF-NA.py method instead)
import glob 
es = au.stuffForScienceDataReduction() 
mslist = glob.glob('uid___A00*_X*_X*.ms') 
es.correctMyAntennaPositions(mslist) 
au.getALMAFluxcsv('flux.csv')

5. Run the pipeline through imaging:

 CASA> eppr.executeppr("PPR.....xml",importonly=False, bpaction='ignore')

Cycle 4 Workflow - Checking for imaging intervention, then running the imaging pipeline

This workflow is for the case that any necessary flagging has been done and that there already exists a "run through calibration-only" working directory that has had all existing flags applied

1. Check whether imaging intervention is needed (can use Excel workbook with "Image Checker" spreadsheet to perform Cy4 pre-imaging checks uploaded to http://jira.alma.cl/browse/SACM-407):
  • Check resolution of phase calibrator in calibrator imaging stage of the weblog.
    • If larger than 1.3x PI request, this probably fails QA2. Talk to DRM
    • If finer than (PI request)/1.3, may still meet PI goal by imaging with different Robust factor, or by flagging outrigger antennas (new workflow), or by sending to manual to image with a taper. Talk to DRM
  • Check if multi-execution ephemeris object. If so, needs modified workflow. Talk to DRM
  • Check if this project is expected to take too long to run or to produce individual imaging products that are too large. If so, PPR needs modified. Talk to DRM.
  • Check if this is a poorly formatted mosaic (Home=> click on any ms name => spatial setup => look at pointing pattern at bottom of weblog. If no pattern, this is single pointing and can be imaged. If pointings all overlap or close together, can be imaged. Otherwise, needs manual imaging or intervention - Talk to DRM

2. If imaging intervention not needed:
  • Provided you still have the same cluster node shell script going that ran the calibration, go to the /working directory, make sure you are in bash,
  • Source the environment script
source /lustre/naasc/sciops/comm/rindebet/pipeline/scripts/pipeline_env_C4R2.sh
  • Start CASA
 CASA> eppr.executeppr("PPR.....xml", importonly=False, bpaction='resume') 

NOTE: We are still investigating the viability of resuming under a wider range of conditions.

How to generate part of a weblog

You can regenerate the web log for certain stages using the following workflow (production):
mv the top pipeline working directory (i.e. 201*/) to the directory it was originally run in (i.e. /home/dared/opt/dared.RHEL7/mnt/dataproc/201*)
cd into the working directory
source /home/dared/opt/dared.RHEL7/etc/dared on a cvpost node
launch pipeline version of CASA (i.e. casa --pipeline)
In CASA command line:
 - import os
 - import pipeline
 - from pipeline.infrastructure.renderer import htmlrenderer (in case htmlrenderer does not properly import with pipeline)
 - os.environ['WEBLOG_RERENDER_STAGES']=[stage number(s) as string or list of strings] (i.e. for applycal stage ['17']) 
 - context = pipeline.Pipeline(context='last').context
 - htmlrenderer.WebLogGenerator.render(context)
Note that this will not regenerate plots, it will only rerender the HTML. If you'd like the plots regenerating too, then you could try deleting all the pngs from html/stageXX and html/stageXX/thumbs. We can't guarantee success as it depends on how the tasks were coded, but it works maybe 75% of the time. Also, this will generate plots based on the current state of the data on disk, so you'll pick up applied flags etc. possibly generated in later stages.

In order to re-upload the pipeline to AQUA:
package up the last pipeline directory (pipeline-201*) in the syntax of the products directory (i.e. MOUS.hifa_calimage.weblog.tgz) (NOTE: do not include the saved_state directory within the pipeline-2* directory)
replace this file in the associated pipeline/root products directory
rename the directory to re-upload to AQUA

How to restore products directory from working

If the pipeline run was moved from the original location, move the pipeline output (full directory) to the location where the output was originally written.

cd down to the working directory
h_resume(last context)
hifa_exportdata(pipelinemode=“automatic”)

where last context is the oldest pipeline-****.context (example: pipeline-20171115T172633.context) file within the working directory

Once the task has completed, cd into the mous level, where you should have a "rawdata", "working" and "products" directory
Topic attachments
I AttachmentSorted ascending Action Size Date Who Comment
matplotlibrcEXT matplotlibrc manage 34 bytes 2013-07-30 - 11:51 ToddHunter  
Metadata_2016Sep28_v1-jh-ByValue.xlsxxlsx Metadata_2016Sep28_v1-jh-ByValue.xlsx manage 4 MB 2016-10-14 - 17:05 JohnHibbard Excel working with "Image Checker" spreadsheet to perform Cy4 pre-imaging checks
pipe.20160511.pdfpdf pipe.20160511.pdf manage 127 K 2016-05-11 - 19:27 RemyIndebetouw pipeline at the NAASC presentation 20160511
PipelineInfrastructure.pdfpdf PipelineInfrastructure.pdf manage 112 K 2013-07-30 - 11:52 ToddHunter presentation from May 2012
Topic revision: r46 - 2019-07-18, TomBooth
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback