This wiki page exists so that ongoing pipeline discussion issues can be maintained and commented on in a single place.
The current focus is on describing the use cases so that the requirements for each pipeline component can be well described.
- Bob's pipeline use case assertions. Disagree or note additional cases here.
- I maintain that all pipeline use cases boil down to combinations of these components:
- Scal calibration scans. These are used to populate the calibration module with Scal values to be used by the main line of the pipeline. Scal data will be used in automated mode by the pipeline in the order in which it is received. In other words, if you want scal data to be automatically applied to pipeline data, the Scal observations have to happen first and the pipeline will use the most recent Scal available and appropriate for that spectral window and receiver. If no appropriate Scal data is available, the pipeline will fall back on using the lab-measured Tcal values. Question: do the official Scal observations we do now ever get stored in an official location? Ideally, the pipeline would use the most recent blessed Scal data instead of the lab-measured Tcal values if available. But I don't know if that's possible now since I don't know what happens to the Scal's produced by staff. Also, in an ideal world someone would monitor the Scal observations taken by observers to (a) make sure nothing funky seems to be changing in a receiver and (b) make sure that the most recent, best Scal values really are available. If we build the pipeline right, it should be possible to re-process data and tell the off-line pipeline to use alternate Scal values that might not have been available when the on-line pipeline did it's automatic processing.(FJL): A bad scal can make a spectral line! Perhaps we should restrict the online pipeline to using an scal that is certified and have a user-supplied scal apply only to reprocessing data. In my experience the cal values are quite stable. Do we expect something different from the kfpa?
- Frequency switched groups of data (most likely mapping data, but not required to be. If not mapping then the pipeline's end product is simply the calibrated integrations - i.e. the step right before the image is made).
- Position-switched groups of data (same as above). These need a reference position (s) to do the final calibration.
- Reference position(s) is(are) synthesized from the mapping data. We need description of the ways we're going to allow this to be parameterized.
- Reference position(s) come from dedicated pointings away from the region being mapped. Using currently available procedures this would separate scans.
- single position before (or after I suppose) the mapping scans
- one reference position before the mapping scan paired with one reference position after the mapping scan.
- Other schemes I've missed here? Note that similar to the Scal comments above, the calibration module (database in the figures, although I think database is a mis-leading word here) is responsible for supplying the appropriate reference spectra to the calibration step and so in an off-line pipeline there could be additional options to set the reference scan that the automatic pipeline processing can't handle).
- Pipeline calibration Steps (Langston).
- The calibration process is a standard set of averages, differences and scaling of the input data, which
have been described by many authors. The detailed options for all different calibration techniques is outside the scope of the pipeline project, except that the
calibration for each observing type (position and frequency switched) should be supported.
- Images - image entire frequency axis or split up into smaller cubes that only cover part of the spectral window.
- DJ votes for imaging the entire axis and leaving the slicing for post-pipeline processing
- advantage - simple, the pipeline doesn't lose anything when making the image
- disadvantage - will be slower, images take up more space
- practical matter (FJL) -- I have been working with a cube that has 90 Mega-samples and with our current computers and viewer software it's quite difficult and painful to manipulate such a large data set. If the KFPA maps a 10' region at the Nyquist spacing the map will be only 50x50 pixels, but with 16k points this ends up being about 36 mega-samples, comparable to my cube. Before we decide to image all channels let's make sure that we have the processing power to handle the resulting data sets.
- Is the frequency-like axis always frequency? If sufficient information is supplied with the image, image processing software can be used to change that to a velocity axis. If we go with imaging the entire spectral window this is the obvious choice. (FJL) What does CASA do about this?
- Quantities needed to describe the image. The mapping procedure can provide these in the GO fits file. I still need to supply appropriate FITS names for these - some of these may already be there, not sure. Some (most, all) of these the pipeline could work by examining the data but by putting them in the GO fits file it should ensure that repeated use of the same procedure with the same arguments will produce identical image coordinates, making combining images much simpler. (FJL): Combining images is tricky in "cube space" because typically the weights are not preserved. This may be critical at K band where the quality of the data may vary a bit from day to day. (end FJL) That makes it simpler to deal with the cases were you either start the mapping procedure already in progress (e.g. to finish something that was aborted prematurely) or the procedure aborts prematurely. It also makes it simpler for the pipeline to set up the output image and start gridding to it before it has all the data in hand. * Desired image coordinate system (RADEC, Galactic, etc) * image center in image coordinate system * image sky coordinate pixel size * some description of the frequency axis.
- Statistical flagging. We've currently tabled this discussion so that we can focus on getting the main elements correct. We should start listing and ranking possible flagging options and tools. Which of these are things the pipeline should always do, which are these that should be optionally controlled via astrid (and hence still done by the on-line pipeline) and which are off-line re-processing options?(FJL): In my recent imaging experience I've run into issues with errors in the weights being applied to the data. If just a few spectra have (erroneously) high weights, they can cause normal data to drop below the weighting threshold and disappear from the cube. It would take just a few seconds of data to establish a distribution of weights which might then be used to discard outliers as they come down the pipeline. Will we have a single weight per spectrum or (shudder) will the weights have to be a vector?