KFPADataProcessingBrainstorming < KPAF

You are here: NRAO Public Wiki>KPAF Web>DirectorsOffice>SciencePipelineWorkshopNov2007>KFPADataProcessingBrainstorming (2007-11-28, BobGarwood) Edit wiki text Edit Attach Print version

Please feel free to edit this page and add any topics for discussion during this workshop. Eventually these items will be folded into the agenda. Visit the Main page of the NRAO external Wiki for instructions and tutorials on registering as a user and editing these "Wiki" pages.

Feel free to add to this list, or comment on this list:

Data Processing Requirements

From: JeffMangum - 21 Nov 2007

I have linked three documents to this discussion which should at least provide examples for how to solve many of the issues below:

On The Fly Observing at the 12 Meter: This was the OTF observers manual for the 12 Meter Telescope implementation of this observing mode. There are descriptions of map setup, online display, and analysis for OTF observations.
Observing with the NRAO 1mm SIS Array Receiver: A document similar to the OTF guide, but focused on observations with the 1mm Array Receiver.
The On The Fly Imaging Technique: Paper published in A&A describing OTF observing and analysis using the 12 Meter Telescope implementation as example.

Other Thoughts

Several of these are long topics by themselves. What should be covered in this session is really whether those topics need to be addressed by the data processing system or are they not viewed as requirements.

Data rate? 7-feed with spectrometer as backend (near term). Long term (61 feeds, wider bandwidth). The data processing path needs to scale well so that we can avoid finding new solutions for a future array or backend whenever possible.
Real-time needs.
- What sort of immediate feedback can we provide. Can a GFM plug-in be sufficient? (See On The Fly Observing at the 12 Meter and Observing with the NRAO 1mm SIS Array Receiver for an example.)
- Pointing and focus scans? (See On The Fly Observing at the 12 Meter and Observing with the NRAO 1mm SIS Array Receiver for an example.)
- Quick-looks at the data (spectra and images)? (See On The Fly Observing at the 12 Meter and Observing with the NRAO 1mm SIS Array Receiver for an example.)
Capturing observer intent. The GBT does not currently capture any observer intent (e.g. calibration scan - what type of calibrator) except by the name of the observing procedure and conventions that that implies (e.g. type of switching, center of a mapped region). What is needed to efficiently drive a pipeline to make appropriate choices? I think that you need to tag each sample with its "intent". For example, tag ON scans with the ON intent, OFF scans with the OFF intent, etc.
What sorts of choices should the pipeline make? Does the type of science imply some calibration precision and does that in turn have some impact on what the pipeline might do (type of calibration, for example)?
How much fine control to give the users (what do they need vs what do they want)? Can some of that fine control be captured in the observer intent that is recorded in the scheduling block? Yes. See On The Fly Observing at the 12 Meter and Observing with the NRAO 1mm SIS Array Receiver for an example.
I tend to think that the individual components should be useful by themselves - probably with more options than the pipeline gives. That would allow users to do more individual inspection of the data, tweak parameters, and replace components with equivalent steps that they think is more appropriate. Does that compromise the automatic processing (pipeline) design at all? As you will see in On The Fly Observing at the 12 Meter, data processing should start with the individual steps which need to be tweaked during commissioning to define the proper analysis process for the array data. These steps can then be stitched together to develop a pipeline. Some level of tweaking will always be necessary in the pipeline. For example, observers may want to spatially smooth their data to increase signal-to-noise, which is best done at the imaging (convolution) step.
Outputs of the pipeline.
- image. What gridding function? TBD during commissioning.
- associated "images" (e.g. weights so that more data could be gridded or images combined later)? Yes. Use SDGRD as an example. It produced (optionally, I think) a weights image just for this purpose.
- intermediate products that might be useful?
- other types of output?
Automatic flagging based on statistical measures? Always do this or ...
Iterative processing - make the image, view the image and edit the data (automatic flagging or visual flagging), remake the image. This might also include using the image as a model to help separate out the interesting signal from everything else.
What is unique about the GBT KFPA pipeline?

Calibration

Was Monday's calibration discussion sufficient to know how to implement this in the pipeline?
What observer intent needs to be captured to control the pipeline efficiently?
- calibration scans, types of calibration scans,
- Are there different science cases that will require different calibration strategies?
Inspecting the calibration results - what feedback is useful vs just extra display noise that a user will ignore.
How much calibration does the pipeline do?
- equivalent of Tcal * (on-off)/off
- aperture efficiency
- atmosphere (elevation)
- baseline removal
Is all of this pre-image formation or does some of it happen in the image?
Iterative calibration (see comment above on iterative processing)

Polarimetry

We have no canned routines in GBTIDL to handle polarization scans. Other's (Robishaw/Heiles, Mason) have their own methods.
Do they apply directly (i.e. is this the same as the single-feed case) or is it more complicated?

Algorithms

Imaging.
- Type of gridding functions. Is there a clear right way to do this? The J1(R)/R * Gaussian is very close to optimal and would be a reasonable default.
- Differences, if any, from gridding a single feed covering the same area?
Sparse arrays. This issue came up in an e-mail discussion with Remo Tilanus. Is that relevant to the GBT?
Other?

Technical Hurdles

How is the pipeline held together? Scripting language? Which one? You already have a working system which might require only minimal adaptation to the FPA problem in AIPS.
Scalability. When it becomes necessary to use multiple CPUs what options are there?
Likely bottlenecks in the pipeline?
Automatic pipeline vs. fine user control over each step without code duplication.

Non-technical Hurdles

Choice of environment - any clear winners?
- Packages: AIPS, CASA, ASAP, any of the pipelines presented here?
- Languages: python, IDL, other?
- Ease of development? (perhaps that influences the choice)
- Interaction with other tools (e.g. VO tools, NRAO e2e archive, PLASTIC)

The User Experience

Feedback.
- Real-time needs. (See On The Fly Observing at the 12 Meter and Observing with the NRAO 1mm SIS Array Receiver for an example.)
- Off-line needs. (See On The Fly Observing at the 12 Meter and Observing with the NRAO 1mm SIS Array Receiver for an example.)
- Allow the user to choose how much information they see.
- What might be interesting?
- How to point out (visually) details to the user that might require their attention (possible problems)?
Level of control - how flexible is the pipeline.
Ease of using the individual components.
Replacing standard components with a user-supplied component (is that simply that the API is fixed and well described and so they can do what they want there or is it more than that?).
intermediate products - possibly most useful during commissioning/debugging/testing new options.
Applicability of this pipeline to other GBT data sources (e.g. single pixel front-ends [imaging or even single pointings])

Data Formats

What format is the data in at each step.
- Input as separate M&C FITS files.
  - Spectrometer backend produces lag-domain data, must be converted to frequency-domain.
  - Separate Antenna FITS file contains antenna pointings, subreflector positions, table describing feed locations, some weather information at the start of the scan. Separately sampled from backend data dumps.
  - IF FITS file describing IF chain.
  - LO FITS file describing LO value that may change through scan (doppler tracking). These are predicted (commanded) LO values and are available at the start of the scan. The timing of the LO changes depends on the requested doppler tracking accuracy. LO values are only changed during blanked time during other switching.
  - GO FITS file describing meta information (scan number, source name, target position, observing procedure name, subreflector switching parameters, etc).
- FITS image cube as output. Should also produce FITS file with UVDATA extension to allow portability to other image processing packages.
- What about the intermediate steps?
- What about associated derived information (e.g. Tcal calibration) which may be taken infrequently?
Conversion to other formats for various analysis systems (SDFITS for GBTIDL, CLASS binary, AIPS for the imaging step, others?)? I see no point to allow import into GBTIDL or CLASS. As mentioned above, FITS file portability of raw "uvdata" to other image processing packages is a requirement.

Archiving

How much is archived? Any intermediate products? Are all images archived? Raw uvdata should be archived, at least.
Interaction with the NRAO archive and this pipeline. Does the archive reprocess data on request or does it simply serve up already processed images?
When would it be necessary/desirable to reprocess data already in the archive?

Data Visualization

Demo of GAIA
Any other visualization tool demos?
What's important in data visualization in terms of
- controlling a running pipeline (e.g. data editing, marking baselines)
- viewing the progress (intermediate products, final product)

Where do we go from here?

Do we know enough to draft a pipeline design?
- GB-241 reserved morning of 11/29 to attempt to sketch it out.
Can this KFPA pipeline be useful for other GBT backends? Is that worth the effort?
- It could be used to process archived GBT data
- Since no observer intent has been captured, a fair amount of effort would be needed to calibrate the data to a reasonable level (whatever that might be).

* WhatProductsOfKFPAPipeline.pdf: Tuesday Afternoon Musings

* The_Science_Case.ppt: Science Case Notes - Karen

Topic attachments
I	Attachment	Action	Size	Date	Who	Comment
pdf	WhatProductsOfKFPAPipeline.pdf	manage	43 K	2007-11-28 - 14:12	BobGarwood	Tuesday Afternoon Musings
ppt	The_Science_Case.ppt	manage	21 K	2007-11-28 - 14:13	BobGarwood	Science Case Notes - Karen

Topic revision: r6 - 2007-11-28, BobGarwood

KPAF

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback