Glen Langston provided the sample script and accompanying procedures attached here. These procedures are written specifically to work with two sets of mapping scans (one RaLongMap sequence and one DecLatMap sequence) of project AGBT08B_040_03. The script assumes that the raw data has already been filled to tmcKuMaps.raw.acs.fits.
The data in question are from the Ku-band receiver. Data from each of the two beams are used independently. There are 4 spectral bands contained in the raw data and the script as provided is configured to process the "0" band. The previously filled data included scans 14 through 88, although that includes several Nod scan pairs which are not used by these scripts. The scans actually used here are 16-49 and 54-86.
This is a summary of the procedures used, with links to the attached procedures.
- mapTmcHc5n.pro: Top-level script. This is not a procedure in itself. It invokes mapCal2Beam, which is a procedure that handles one set of the mapping scans (e.g. 16-49).
- mapCal2Beam.pro: top-level procedure to calibrate mapping data for a specific band. This invokes referenceBeam2 and calibratedBeam2.
- referenceBeam2.pro: calculates and keeps the reference beam spectra.
- calibrateBeam2.pro: top-level calibration procedure for the signal data - this invokes calScanInt2.
- calScanInt2.pro: procedure to calibrate and keep a specific integration.
- tatm.pro: Estimates effective atmospheric temperature from local ground temp. Based on model from Ron Maddalena.
- opacity.pro: estimates opacity factor - used to correct for atmospheric opacity given zenith opacity.
- natm.pro: Estimates number of atmospheres along line of sight for given elevation. From model by Ron Maddalena.
- etaGBT.pro: GBT efficiencies as a function of frequency from a memo by Jim Condon.
- refbeamposition.pro: Adjusts the pointing direction for the beam offsets for the non-tracking beam.
- tsysair.pro: Estimates atmospheric contribution to Tsys.
A number of GBTIDL procedures and buffers are used, however most of that is IDL-specific book-keeping. The three useful GBTIDL procedures relevant to pipeline work are:
- gettp Does a total-power calibration.
- accum This is how the weighted average is accumulated as part of the determination of the reference spectra.
- ave This is what turns the accumulated weighted sum into an appropriate average.
A few Goddard IDL library functions related to coordinate conversions are used by refbeamposition and aren't discussed in any detail here.
The GBTIDL calibration
document is a useful reference when considering this example. Finally, the KFPA pipeline diagram
should be kept in mind during this discussion.
Ignoring the bookkeeping details specific to GBTIDL, the data processing steps boil down to these items.
- Total-power calibration. The gettp procedure is applied to each integration used, whether reference or signal.
- data_tp = (data_no_cal + data_with_cal)
- Tsys = Tcal/2.0 + Tcal * mean_80(data_no_cal) / mean_80(data_with_cal - data_no_cal)
- data_no_cal is the data when the switching cal diode is off
- data_with_cal is the data when the switching cal diode is on
- Tcal is a scalar. It is usually the value supplied by sdfits, which is an average of the Tcal values from the receiver FITS file over the bandpass of the spectrum. It can also be a supplied scalar value to override the value found with the raw data.
- mean_80 means that it is the mean taken over the inner 80% of the channels (excluding 10% of the channels at each end of the spectrum).
- Determination of the reference spectrum to use in the next calibration step. (Identify "OFF" data in the KFPA pipeline diagram).
- In this example this is done by averaging some of the raw mapping scans. It's difficult to describe in words which spectra contribute to that average. Essentially, only some of the scans near the first and last scan in the mapping block (e.g. scan 16 and 49) and within those scans, only some of the integrations are used. I believe the intent is to use spectra which are assumed to have to no interesting signal in them to construct a synthetic reference spectra. This is done for each beam and for each polarization. The selected scans are averaged using the GBTIDL accum procedure followed by the ave procedure.
- weight = exposure_time / (Tsys*Tsys)
- The Tsys assigned to the average is the weighted average of the individual Tsys values, using the same weight.
- The exposure time and integration time assigned to the average are simply the sums of each quantity from the individual spectra contributing to the final average.
- Once the average is determined, a median filter of full width 103 channels is applied to the data to smooth it. The goal here is to reduce the noise contributions in each channel while not over-smoothing and removing real features that one hopes to remove from the rest of the data in the next calibration step.
- At the same time, the tsysair procedure is used to estimate the effective atmospheric contribution to Tsys for each given elevation. The individual Tsysair values from each spectrum in the reference average are averaged together (with uniform weight) and that average is stored in the tambient header location of the reference spectrum average for later use.
- Calibration. Using the IDL procedures identified above, the following spectrum is calculated and saved for each integration in all scans. These saved spectra are what is used to generate the final image.
- data = Tsys_ref * [exp(zenith_tau * n_atmospheres) / eta_a] * (sig_tp - ref_tp) / ref_tp - (tsysAir - tsysAir_ref)
- sig_tp is the total-power calibrated data for this integration.
- ref_tp is the averaged and smoothed reference spectrum determined in the previous step.
- zenith_tau is the zenith opacity supplied in the script (eventually sdfits will get this from the weather-based predictions).
- n_atmospheres is a quantity returned by the natm IDL script. At high elevations this is simply 1/sin(elevation). A more complicated expression is used at lower elevations.
- eta_a is an estimate of the point source efficiency at the center frequency.
- Tsys_ref is the system temperature associated with the ref_tp data.
- tsysAir is, again, an estimate of the effective atmospheric contribution to Tsys at that elevation.
- tsysAir_ref is the average tsysAir associated with the reference spectrum as described in the previous step.
- Imaging (gridding).
- idlToSdfits is used to convert the calibrated data, saved into an sdfits file, into a format that classic AIPS understands. Glen recommends using these arguments.
- -w 100 this subtracts a median filtered baseline of width 100.
- -c 1920:2175 Only these channels are used. Presumably these bracket the line of interest.
- classic AIPS is used to generate the image. The arguments that might be interesting to users to control still need to be described here.
- UVLOD - reads in the file generated by idlToSdfits
- SDIMG - actually makes the image.
Things which a script can not current guess or extract from the raw data include (this list may not yet be complete)(this list also contains important discussion points, these should be pulled out of here and organized eventually).
- How the reference spectra is to be determined. There are likely a number of possible ways this could be done, as it is here, by assuming there's no interesting signal in parts of the data. We need to work out how to specify that step for any of the specific ways we would want to allow. Then we need to also decide how to indicate what way is to be used. One could also use specific off positions, which would need to specified differently. It should be noted that the final calibration step can not be done until all of the data has been taken because the final average reference scan isn't available until then. One can recast the calibrated data equation to separate out the ref_tp from the sig_tp. One can then imagine gridding the sig_tp part and then constructing a ref_tp image and doing appropriate divisions of those images and scaling of the results to get mathematically the same result above. While that might appear faster to the user (especially if one saves the individual weights, which are common to both images, when they are calculated while making the sig_tp image), the numerical precision will be less in the result because of the nature of floating point calculations. i.e. a weighted average (which is what gridding is at some level) of (a-b)/b is more accurate that separately weighted averages of a and b combined as a/b - 1. For this mode, we need to specify what the quick-look shortcut is to finding a useful reference spectrum before all of the data have been taken. It depends on what sort of detail is necessary in the quick-look sense. Also, all of the above calculations currently are done with single precision (32-bit) floating point accuracy (because that is how the raw data comes). It would be possible to do the above in double-precision. That might eliminate the concern about loss of numerical precision and so it might be useful if it resulted in a faster appearing result because the pipeline could process some of the data all the way through to the image without waiting for all of the data to arrive. I think those issues should only be pursued once it's clear where the pipeline processing bottlenecks are.
- The zenith opacity. Eventually sdfits will get this using Ron's predictions based on the weather information.
- The area of interest around the spectral line to include in the final image (it could default to the entire bandpass).
- image center and cell size along the sky axes (although we can probably come up with heuristics for appropriate guesses for these based on what's in the data to be gridded).
There are also a number of default assumptions that user's might want to tweak (this list may not yet be complete):
- The width of the median filter applied to the average reference spectrum.
- The width of the median filter used in removing a baseline before imaging.
- One might also do some spectral smoothing (e.g. hanning) of the data before imaging - if so, the user might choose to drop (reduce) appropriate channels in the final image (e.g. every other channel when hanning smoothing).
This example uses scalar Tcal, Tsys, and tau values. The above can be generalized to be useful with vector quantities. Some care will need to be taken, then, in describing how the Tsys is smoothed - replacing the roll that taking the mean performs in this example. I suppose eta_A might also vary across the bandpass by enough that we'll need to consider that to be a vector.
Issues that seem most pressing to me include (from an e-mail I sent out to the pipeline working group):
- In the vector Tsys version we're striving for how much smoothing should Tsys get before it's used - it might be more appropriate to smooth the (data_with_cal - data_without_cal) vector that appears in the denominator rather than smoothing the Tsys vector that results.
- I think we need difference scenarios on how reference spectra can be determined.
- frequency switched - each integration contains it's own reference - although some smoothing may be desirable before it's used.
- specific on-off position.
- averaging several specific on-off positions.
- interpolations between specific on-off positions (interpolations in time, elevation, ???)
- averaging data from the mapped region, as in this example
- several different averages from the mapped region, from which interpolations are constructed
- Is the basic calibration equation in the step labeled "calibration" here sufficiently general. What parts need to be tweaked depending on that specific use case?
- baseline removal. Glen here uses idlToSdfits to remove a median filtered baseline.
There are other issues, but I think mostly those involve getting parameters from Astrid so that the pipeline knows what to do (e.g. spectral line region to image).