next_inactive up previous


Casapy Flag tool - Functionality and User Interface

Urvashi R.V.   (2007-01-25)

This is a short description of the function names and parameter lists to be used in the casapy flag tool.

Several users have given suggestions about the functionality of the flag tool, and this has been folded into the tool description. The last section of this document lists user-requests which cannot be done in the initial version, but which will be incorporated into the tool later, as and when the algorithms and the supporting code structure become well defined.

Many thanks to J.McMullin L.Davis, D.Whysong, J.Lightfoot, N.Kanekar, W.Brisken, M.Rupen,F.Owen, E.Fomalont, T.J.Cornwell for comments and suggestions.

Please send feedback to jmcmulli@nrao.edu, rurvashi@nrao.edu.


Contents

Data selection

The following function will be used to select a subset of the MS, on which to perform subsequent flagging operations.

All data selection syntax will follow the ms-selection syntax and allows lists of numbers or strings, ranges, multiple ranges, combinations of lists and ranges separated by commas, and wild-carding.

>> fg.setdata(antenna=[],field=[],spw=[],array=[],feed=[],scan=[],
              baseline=[],uvrange=[min,max],time=[],
              freqrange=[min,max],channel=[],correlation=[])
This function will return a record, with some information about the subset of the MS that has been selected. For example, "Number of rows selected; number of visibilities, antennas, channels, pols; fraction of antennas, times,chans that are in this selection".

Manual flagging

The following function will select specific data, for manual flagging.
>> fg.setmanualflags(antenna=[],field=[],spw=[],array=[],feed=[],scan=[],
                     baseline=[],uvrange=[min,max],time=[],
                     freqrange=[min,max],channel=[],correlation=[],
                     unflag=T/F, rowFlag=T/F, autocorrelation=T/F)
The flags will get applied to the data, after a call to the fg.run() function.
Note that fg.setmanualflags() can be called multiple times before a call to fg.run(). This will apply flags to all selections during a single pass through the data.

Example : To flag chans 4,5 for antenna 2, and chans 16,17 for antenna 5.
          All corresponding baselines, correlations and times will be flagged.
>> fg.setmanualflags(antenna=[2],chanList=[4-6]);
>> fg.setmanualflags(antenna=['C03'],chanList=[16,17,40]);
>> fg.run();

fg.printmanualflagselection() prints out the current list of manual flag selections.
fg.clearmanualflagselection() clears any previous manual flag selections. Note that this only clears flag selection settings, and does not undo any flags that have already been applied to the data.
To undo flags, one must use fg.setmanualflags(...,unflag=True),

Other simple semi-manual flags are Clip and Quack.

fg.setclip(antenna=[],field=[],spw=[],array=[],feed=[],scan=[],
           baseline=[],uvrange=[min,max],time=[],
           freqrange=[min,max],channel=[],correlation=[],
           clipExpr='',clipRange=[min,max],ouside=T/F); 
fg.setquack(scaninterval,length);               .
>> fg.setclip(clipExpr='abs I',clipRange=[1e-06,1e+6],outside=T);
                                                     -> Flag data outside the range
>> fg.setquack(scaninterval='20min',length=5sec');   -> Flag beginning and end of scan.

Algorithms

The algorithms currently implemented are identical to those in the old autoflag tool. They include sliding-window median filters in time and frequency (time-median, freq-median), and the spectral-rejection, and uv-binning algorithms.
Current algorithm documentation : casa autoflag tool casapy autoflag tool

The following functions gathers input parameters for different autoflag algorithms. Each algorithm can have a different parameter list, and this is to be supplied as a record of parameters.

>> fg.getautoflagparams(algorithm='timemedian')    -> returns a list of current params.
>> fg.setautoflag(algorithm='timemedian',parameters=[column='',expr='',threshold='',...])

Multiple algorithms can be set-up before a call to fg.run(trial=T/F). This creates a list of active algorithms, which are applied in succession to each chunk of data as the program iterates through the data.

If an algorithm is run in trial mode, then flags are not written to disk. The output of the autoflag algorithm can be monitored, and flags later committed to disk via an optional fg.writeflagstodisk() function. One could interactively use this function as follows.

>> inputpars = fg.getautoflagparams(algorithm='timemedian');
>> print inputpars   
        column = 'data'
        expr = 'abs I'
        thr = 0.5
        ...
>> inputpars.column = 'correcteddata'
>> print inputpars
        column = 'correcteddata'
        expr = 'abs I'
        thr = 0.5
        ...
>> fg.setautoflag(algorithm='timemedian', parameters = inputpars);
>> fg.run();

The data expression used to compute the data on which the algorithm is to be applied, is specified by the following parameters.
column : data/correcteddata/residualdata/weights/invweights/weighteddata
expr : abs/arg/real/imag/pow(2)   RR/LL/RL/LR/XX/YY/XY/YX/I/Q/U/V

fg.printautoflagparams() prints out the names of currently active autoflag algorithms and their parameters.
fg.clearautoflagparams() clears any autoflag parameter settings. It can take in an algorithm name as a parameter, to remove only specific algorithms from the currently active list.

Flag summaries/statistics

The following functions provide various views of the data. These functions can be called to monitor flags before calling any flag/autoflag functions.

Flag Statistics

The following function will display the percentage of data flagged vs Antenna, Baseline, SpwIndex, Channel, Correlations,TimeRange, FieldId, UVRange
These results can either be listed out on the logger, or when applicable, sent to a plotter for graphical output. They can also be returned in a suitable data record for automated parsing. For now, a list will be printed out to the logger, and returned in a record. The data selection parameters are identical to those in fg.setmanualflags().
fg.showflagsummary(antenna=[],field=[],spw=[],array=[],feed=[],scan=[],
                   baseline=[],uvrange=[min,max],time=[],
                   freqrange=[min,max],channel=[],correlation=[])

Queries

These functions will allow the user to identify parts of their data sets, with more or less than some fraction of flags. The following function will return lists of AntennaIndex, Baselines, SpwIndex and Channels, Correlations,FieldIndex with more (or less) than a specified percentage of flagged data (percentage threshold (thr) : [0-1]). They can also be queried based on an absolute number of flags. These are 1-D queries.
fg.queryflags(antenna=[],field=[],spw=[],array=[],feed=[],scan=[],
              baseline=[],uvrange=[min,max],time=[],
              freqrange=[min,max],channel=[],correlation=[],
              what="", thr= ,nflags=, morethan=T/F)
The axes along which 1-D queries can be made will be what = antenna, baseline, spwindex, channel, correlations, fieldID.
thr is the fraction of flagged data to be used as the threshold for listing.
nflags is an absolute number of flags to be used as the threshold for listing.
morethan chooses between searching for more than or less than the specified number/fraction of flags.
>> fg.queryflags(what = 'antenna',thr = 0.5, [morethan=True])
>> fg.queryflags(field=[2,3], what = 'channel',thr = 0.8)
...etc...
Some 2-dimensional queries can be done, by calling multiple 1-D queries in succession. Other 2-D queries will be added later.

Extending flags

  1. Extend existing flags along a specified axis. For example, 'for a given timestamp and baseline, if any channel is flagged, the entire channel.'. The allowed axes along which flags can be extended will be Antenna, Baseline, Channel, Correlation,Time. One can select an arbitrary subset of the data, and then extend its flags along a specified axis. The default selection is the entire data.
    fg.extendflags(antenna=[],field=[],spw=[],array=[],feed=[],scan=[],
                   baseline=[],uvrange=[min,max],time=[],
                   freqrange=[min,max],channel=[],correlation=[],
                   along='',width= , )
    
    The along parameter can take on the following values - channel, time, hourangle to be used along with the width parameter. Units for specifying the width will be decided later, and for now, will just be "number of data points". Other values for along would be antenna1, antenna2, bothantennas, allbaselines.

    >> fg.extendflags(along='channel',width=3) -> Flag 3 channels on either side of a
                                                  flagged channel
    >> fg.extendflags(along='antenna1')        -> Flag all corresponding visibilities 
                                                  for antenna 1 of the flagged baseline.
                                                  (for the corresponding time-stamp).  
    >> fg.extendflags(along='bothantennas')    -> Flag all corresponding visibilities 
                                                  for both antennas of the flagged baseline.
                                                  (for the corresponding time-stamp).
    

  2. The output from query functions can be sent into the manual flagging functions, to extend flags to other dimensions.
    Example 1 : Flag all channels that have more than 70% of their data already flagged.
    >> fg.clearmanualflagselection();    -> Cleans up any old manual flag selections 
    >> chans = fg.queryflags(what='channel', thr= 0.7) 
                                         -> returns a list of channel numbers
                                             corresponding to channels with more
                                             than 70% of the data being flagged.
    >> fg.setmanualflags(channel=chans) -> Select these channels for manual flagging
    >> fg.run();                         -> Apply the flags.
    
    Example 2 : For fieldID=1 (calibrator), find a list of "bad" channels with more than 60% data flagged. Transfer these flags to fieldID=2 (source) to flag all channels of the target source, that would have been calibrated by the "bad" channels from the calibrator.
    >> fg.clearmanualflagselection();   
    >> chans = fg.queryflags(field=['3C48'],what='channel',thr=0.6);
    >> fg.setmanualflags(field=['cygA'],channel=chans);
    >> fg.run();
    

Other functions

The following functions will be added in once the above functionality exists.
  1. A flag command history can be compiled via a combination of fg.printmanualflagselection() and fg.printautoflagparameters(), with a parameter to append to a history file, instead of printing to the screen/logger. This can be done via python scripts at the task level. Read-back and flag-version ability will be added in later.
  2. [DW,NK] : Read in manual-flagging parameters from a text file.. This could be done (at the task level) via python scripts that can read the text file, and call fg.setmanualflags(). The formats can be compatible with the output of fg.printmanualflagselection() and the history file.
  3. [LD,JL] : Other algorithms that will be added in are 2-D autoflag algorithms that include ones being explored for the ALMA pipeline flagging capability - use simple statistics to do flagging on 2-D views of the data. Views that will be supported are (baseline vs chan), (baseline vs time), (time vs channel), etc.. .

  4. [NK] : Flag antennas for timeranges for which the antenna is shadowed by another.
  5. [NK] : Flag time-ranges based on an elevation limit for a specified source.
  6. [NK] : Allow flags to be extended for a uv-range around a flagged point,
  7. [FO] : Allow the ability to average data, and look for flags, but to extend the flags to the unaveraged data set. This is possible for visual flagging using msplot. This will be added into the "flag" tool too.
  8. [MR] : Warn the user when more than 90% of any axis, has been flagged. This can be done via python scripts at the task level, where queries are run immediately after every fg.run call.

The following are functions to be added later, because either the exact algorithms to be implemented are not clear yet, or the supporting code-structure does not yet exist.

  1. Some AIPS flagging tasks ?
  2. [DW,JL,LD,NK] : Write flags to an external text file in some simple text format. This should be optionally returned as a python record (for a pipeline).
    [MR]: Need flag-tables with history associated with the flag-tables, and not the MS.
  3. [DW] : Calibration-based flagging.
    [MR]: Flag based on Tsys, wideband autocorrelations, quantization corrections, poor or good a priori calibration or antenna positions, WVR results, decorrelations [at high freq. - cf. ALMA's two data streams], etc. etc.
    [EF]: Run autoflag algorithms on gain-solutions, and create antenna-based flags. Allow these flags to be applied (extended) to the main table.
    Algorithms can be allowed to run on MS subtables, and associated flags be applied to the main table.
  4. [DW] : Flag statistics. 2D plot of histograms of the data (with flags applied).
    X-axis : time, binned in coarse increments.
    Y-axis : the histogram bins (steps in amplitude).
    The plot should be a gray-scale (stack up 1-D histograms), or a set of lines for (mean, 1-sigma, 2sigma, etc..). The user should be able to adjust statistical parameters, and see the effect on the low-resolution summary data.
  5. [TC] : Queries that return histograms.
  6. [NK] : Add in queries that list time-ranges and uv-ranges with more than some percentage of data flagged.
  7. [LD,JL,FO,MR] : Algorithms that identify ranges/areas of moderately bad data, as opposed to isolated points. 2D autoflag algorithms that work on different views of the data and flag based on simple statistics.
  8. [MR] : Data selection should include clipExpr,clipRange This will be added in once ms-selection can support it.
  9. [FO] : Autoflag algorithms that work on 3-dimensional views of the data. Awaiting details.
  10. [MR,EF] : Algorithm that will flag based on closure errors. Use the gain table, compute closure quantities, and flag accordingly.
  11. [MR] : For wide-band data, flag across channels after taking into account source spectral indices.
  12. [MR] : Allow the user to randomly flag X% of the data.

Displays

2-D views of the data can be created as specified, and clusters of flagged data will be visible. These functions will interact with the viewer, where interactive flagging can be done.
Currently this is not possible, but we are exploring this.
fg.displayflags(antenna=[],field=[],spw=[],array=[],feed=[],scan=[],
                baseline=[],uvrange=[min,max],time=[],
                freqrange=[min,max],channel=[],correlation=[],
                x="", y="", column="", expr="", showflags=T/F)
>> fg.displayflags(fieldName=['3c48'], x='antenna',y='channel', column='data', expr='abs I', showflags=True)
>> fg.displayflags(x='time', y='baseline', column='residualdata', expr='abs I', showflags=True)
...etc...
As per ALMA pipeline requirements, there will be an option to write these 2D views to disk, to be looked at later, or loaded in the viewer.

About this document ...

Casapy Flag tool - Functionality and User Interface

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html flag_casapy.tex

The translation was initiated by R. V. Urvashi on 2007-01-26


next_inactive up previous
R. V. Urvashi 2007-01-26