Please feel free to edit this page and add any topics for discussion during this workshop. Eventually these items will be folded into the agenda. Visit the Main page of the NRAO external Wiki for instructions and tutorials on registering as a user and editing these "Wiki" pages.

Feel free to add to this list, or comment on this list:

Data Processing Requirements

From: JeffMangum - 21 Nov 2007

I have linked three documents to this discussion which should at least provide examples for how to solve many of the issues below:

Other Thoughts

Several of these are long topics by themselves. What should be covered in this session is really whether those topics need to be addressed by the data processing system or are they not viewed as requirements.

  • Data rate? 7-feed with spectrometer as backend (near term). Long term (61 feeds, wider bandwidth). The data processing path needs to scale well so that we can avoid finding new solutions for a future array or backend whenever possible.
  • Real-time needs.
  • Capturing observer intent. The GBT does not currently capture any observer intent (e.g. calibration scan - what type of calibrator) except by the name of the observing procedure and conventions that that implies (e.g. type of switching, center of a mapped region). What is needed to efficiently drive a pipeline to make appropriate choices? I think that you need to tag each sample with its "intent". For example, tag ON scans with the ON intent, OFF scans with the OFF intent, etc.
  • What sorts of choices should the pipeline make? Does the type of science imply some calibration precision and does that in turn have some impact on what the pipeline might do (type of calibration, for example)?
  • How much fine control to give the users (what do they need vs what do they want)? Can some of that fine control be captured in the observer intent that is recorded in the scheduling block? Yes. See On The Fly Observing at the 12 Meter and Observing with the NRAO 1mm SIS Array Receiver for an example.
  • I tend to think that the individual components should be useful by themselves - probably with more options than the pipeline gives. That would allow users to do more individual inspection of the data, tweak parameters, and replace components with equivalent steps that they think is more appropriate. Does that compromise the automatic processing (pipeline) design at all? As you will see in On The Fly Observing at the 12 Meter, data processing should start with the individual steps which need to be tweaked during commissioning to define the proper analysis process for the array data. These steps can then be stitched together to develop a pipeline. Some level of tweaking will always be necessary in the pipeline. For example, observers may want to spatially smooth their data to increase signal-to-noise, which is best done at the imaging (convolution) step.
  • Outputs of the pipeline.
    • image. What gridding function? TBD during commissioning.
    • associated "images" (e.g. weights so that more data could be gridded or images combined later)? Yes. Use SDGRD as an example. It produced (optionally, I think) a weights image just for this purpose.
    • intermediate products that might be useful?
    • other types of output?
  • Automatic flagging based on statistical measures? Always do this or ...
  • Iterative processing - make the image, view the image and edit the data (automatic flagging or visual flagging), remake the image. This might also include using the image as a model to help separate out the interesting signal from everything else.
  • What is unique about the GBT KFPA pipeline?


  • Was Monday's calibration discussion sufficient to know how to implement this in the pipeline?
  • What observer intent needs to be captured to control the pipeline efficiently?
    • calibration scans, types of calibration scans,
    • Are there different science cases that will require different calibration strategies?
  • Inspecting the calibration results - what feedback is useful vs just extra display noise that a user will ignore.
  • How much calibration does the pipeline do?
    • equivalent of Tcal * (on-off)/off
    • aperture efficiency
    • atmosphere (elevation)
    • baseline removal
  • Is all of this pre-image formation or does some of it happen in the image?
  • Iterative calibration (see comment above on iterative processing)


  • We have no canned routines in GBTIDL to handle polarization scans. Other's (Robishaw/Heiles, Mason) have their own methods.
  • Do they apply directly (i.e. is this the same as the single-feed case) or is it more complicated?


  • Imaging.
    • Type of gridding functions. Is there a clear right way to do this? The J1(R)/R * Gaussian is very close to optimal and would be a reasonable default.
    • Differences, if any, from gridding a single feed covering the same area?
  • Sparse arrays. This issue came up in an e-mail discussion with Remo Tilanus. Is that relevant to the GBT?
  • Other?

Technical Hurdles

  • How is the pipeline held together? Scripting language? Which one? You already have a working system which might require only minimal adaptation to the FPA problem in AIPS.
  • Scalability. When it becomes necessary to use multiple CPUs what options are there?
  • Likely bottlenecks in the pipeline?
  • Automatic pipeline vs. fine user control over each step without code duplication.

Non-technical Hurdles

  • Choice of environment - any clear winners?
    • Packages: AIPS, CASA, ASAP, any of the pipelines presented here?
    • Languages: python, IDL, other?
    • Ease of development? (perhaps that influences the choice)
    • Interaction with other tools (e.g. VO tools, NRAO e2e archive, PLASTIC)

The User Experience

  • Feedback.
  • Level of control - how flexible is the pipeline.
  • Ease of using the individual components.
  • Replacing standard components with a user-supplied component (is that simply that the API is fixed and well described and so they can do what they want there or is it more than that?).
  • intermediate products - possibly most useful during commissioning/debugging/testing new options.
  • Applicability of this pipeline to other GBT data sources (e.g. single pixel front-ends [imaging or even single pointings])

Data Formats

  • What format is the data in at each step.
    • Input as separate M&C FITS files.
      • Spectrometer backend produces lag-domain data, must be converted to frequency-domain.
      • Separate Antenna FITS file contains antenna pointings, subreflector positions, table describing feed locations, some weather information at the start of the scan. Separately sampled from backend data dumps.
      • IF FITS file describing IF chain.
      • LO FITS file describing LO value that may change through scan (doppler tracking). These are predicted (commanded) LO values and are available at the start of the scan. The timing of the LO changes depends on the requested doppler tracking accuracy. LO values are only changed during blanked time during other switching.
      • GO FITS file describing meta information (scan number, source name, target position, observing procedure name, subreflector switching parameters, etc).
    • FITS image cube as output. Should also produce FITS file with UVDATA extension to allow portability to other image processing packages.
    • What about the intermediate steps?
    • What about associated derived information (e.g. Tcal calibration) which may be taken infrequently?
  • Conversion to other formats for various analysis systems (SDFITS for GBTIDL, CLASS binary, AIPS for the imaging step, others?)? I see no point to allow import into GBTIDL or CLASS. As mentioned above, FITS file portability of raw "uvdata" to other image processing packages is a requirement.


  • How much is archived? Any intermediate products? Are all images archived? Raw uvdata should be archived, at least.
  • Interaction with the NRAO archive and this pipeline. Does the archive reprocess data on request or does it simply serve up already processed images?
  • When would it be necessary/desirable to reprocess data already in the archive?

Data Visualization

  • Demo of GAIA
  • Any other visualization tool demos?
  • What's important in data visualization in terms of
    • controlling a running pipeline (e.g. data editing, marking baselines)
    • viewing the progress (intermediate products, final product)

Where do we go from here?

  • Do we know enough to draft a pipeline design?
    • GB-241 reserved morning of 11/29 to attempt to sketch it out.
  • Can this KFPA pipeline be useful for other GBT backends? Is that worth the effort?
    • It could be used to process archived GBT data
    • Since no observer intent has been captured, a fair amount of effort would be needed to calibrate the data to a reasonable level (whatever that might be).

* WhatProductsOfKFPAPipeline.pdf: Tuesday Afternoon Musings

* The_Science_Case.ppt: Science Case Notes - Karen
Topic attachments
I Attachment Action Size Date Who Comment
The_Science_Case.pptppt The_Science_Case.ppt manage 21 K 2007-11-28 - 14:13 BobGarwood Science Case Notes - Karen
WhatProductsOfKFPAPipeline.pdfpdf WhatProductsOfKFPAPipeline.pdf manage 43 K 2007-11-28 - 14:12 BobGarwood Tuesday Afternoon Musings
Topic revision: r6 - 2007-11-28, BobGarwood
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback