A description of the steps sdfits takes going from raw GBT FITS files to output SDFITS files.

Introduction

This describes the details of how the sdfits program takes the contents of the raw GBT FITS files and constructs output binary FITS tables using the SDFITS convention. Only the processing of the raw data is described here. The calibration modes still available in sdfits through the -cal and -avg options are not well supported and not recommended for users. They remain there merely as placeholders. Eventually some of the calibration available in GBTIDL will be available through these options. Users interested in calibration details should consult the latest GBTIDL calibration documentation.

The GBT data dictionary may be useful in understanding some of the terms used here (e.g. the definition of "scan").

SDFITS References

The SDFITS convention was developed in the late 1980s. An early version was used in the UniPOPS data analysis program and it was modified to work in aips++. The version used by aips++ was chosen as the disk storage format for GBTIDL. The two references above describe that version. Since that version of SDFITS was developed, the FITS community has made considerable progress in describing coordinate axes both in images as well as in binary tables. Other observatories have made some attempt to adjust the SDFITS convention to use these newer descriptions of world coordinate systems. The SDFITS in use in Green Bank has not yet been updated. It is anticipated that an effort will be made in the near future to do that so that it should be easier to share data between the various telescopes and analysis programs.

Briefly, SDFITS is a convention for storing single-dish radio-astronomy data in a FITS binary table. It consists of a dictionary of a few required columns, a few more suggested columns and the freedom to add any additional columns as necessary. A column that has the same value in all rows can be expressed instead as a FITS keyword provided the column name is no longer than 8 characters. A constant column expressed as a FITS keyword is called a "virtual" column. There is a DATA column which is multi-dimensional. The shape of the DATA column is given by either a TDIMn keyword (where "n" is the DATA column number) or sometimes by a TDIMn column when the shape may change from row to row. The sdfits program always writes out the DATA's TDIMn value as a keyword, meaning that the DATA shape is fixed within a single binary table produced by sdfits. There are also several associated columns which describe the physical coordinates associated with each axis of the DATA column (frequency, pointing direction, polarization). All but the first axis (which must be a frequency-type axis) will have just a single pixel. This means that each row in an SDFITS binary table is a single spectra with associated information (often called "the header").

The raw GBT FITS files

The following documents may be useful in understanding some of the terms here. These documents describe the raw FITS files that the sdfits program uses.

Processing order

sdfits uses the ScanLog.fits file to drive the processing. The only FITS files that are known to sdfits are those found in this master FITS file for the scan being processed. The format of the ScanLog.fits file is mentioned in the document describing common information and structure in the "Description of GBT FITS Files" section. Scans are either processed in the order they are found in this file (the order in which they were observed) or the order that they are given when the -scans option is used at the sdfits command line. Note that the -timestamp option may limit the actual scans processed but does not change the order in which scans are processed. There is no check on duplicate scans in the list given with the -scans option and sdfits will fill the data from the same scan number as many times as it appears in the argument to -scans.

Within a given scan, the current processing order for spectral-line data is to process all of the data for a given integration before moving on to the next integration. Within each integration, each sampler is processed in the order in which it appears in the data. For each sampler, the states are processed in the order in which they appear in the data. For DCR data, each sampler is fully processed (all integrations and states) before moving on to the next sampler. A previous version of sdfits processed spectral-line data in the same sense (sampler varying slowest in the output sdfits file). Therefore, consumers of the output of sdfits should not rely on the current processing order since future versions of sdfits may change that order.

Options

See sdfits -help for reminders of these options and their usage.
-backends=DEVICES
Normally sdfits produces FITS files for each supported backend found in the project being filled (possibly subject to other data selection described below). This option can be used to limit sdfits to a subset. DEVICES is a comma separated list from: acs, sp, and dcr. acs is the (autocorrelation) Spectrometer, sp the Spectral Processor and dcr is the digital continuum receiver.
-mode=MODE
sdfits produces raw, uncalibrated results by default. This option is designed to allow the user to direct sdfits to do some default calibration and averaging. The calibration and averaging currently available are outdated and not recommended and are not described here. This option is not recommended. It remains in the code as a place-holder for future work that will use the same calibration available in GBTIDL to produce calibrated and average results with a single call to sdfits for some types of data.
-quiet
suppress non-error related messages
-scans=SCAN_NUMBER_LIST
Only fill some scans instead of all scans found. See the output of sdfits -help for more information. Note that scans are processed in the order listed in SCAN_NUMBER_LIST. Also note that if the same scan number appears more than once here that it will be filled more than once. sdfits never checks to see if it has already filled a given scan.
-timestamp=STARTTIME,ENDTIME
This is useful to limit sdfits to a specific set of data. It is intended for cases where the same scan number appears more than once in the project and the user needs to limit the sdfits operation to a specific instance of that scan number. See the sdfits -help output for more details.
-noindex
GBTIDL uses an index to aid in data selection and speed access to requested data. sdfits normally generates that index file when it fills spectral line data. Use this option to turn that off. GBTIDL can regenerate the index file from the FITS file as necessary. The contents of that index file are not described here.
-append
Use this option when you want to append to an existing SDFITS file (normally sdfits starts over and removes any conflicting file names at the start). Note again that sdfits never checks to see what is already in that FITS file so it is likely that -scans or -timestamp or both would be used with this option.
-fixbadlags
Attempt to find and fix bad lags in the ACS data. A log file named OUTPUT_PREFIX.MODE.acs.fixed_lags is written to when this option is used. See the discussion of Spectrometer data for more information.
-sigmafactor=SIGMA_FACTOR
Used to fine-tune the bad lag search.
-spikestart=SPIKE_START
Also used to fine-tune the bad lag search.
-version
Show the program version information.
-help
Show usage information.

Primary HDU

Keyword Value Description Notes
DATE YYYY-MM-DDTHH:MM:SS date and time this HDU was created, UTC  
ORIGIN NRAO Green Bank Origin of observation ORIGIN keyword from ScanLog.fits
TELESCOP NRAO_GBT The telescope used TELESCOP keyword from ScanLog.fits
INSTRUME string backend The current backend being filled ("DCR", "SpectralProcessor", "Spectrometer")
SDFITVER sdfits ver1.7 The version of the sdfits program used to produce this HDU Changes each time a new version of sdfits is released
FITSVER 1.4 The version of the content of this FITS file Changes when columns are added or removed or when the values being written are fundamentally different (e.g. a column in hours changes to a column in seconds)

SDFITS binary tables

Note that there may be multiple SDFITS binary tables in one sdfits file. The number of binary tables depends on how often the size of the DATA column changes. There will be a new binary table each time the width of the DATA column changes. Since DCR data always has one DATA value per row, there is always only one SDFITS binary table for DCR data. Note that some of the "Column"s listed here are really present as keywords (virtual columns) in the SDFITS binary tables produced by sdfits. The "Notes" entry for a virtual column will always describe it as a keyword. Readers should never rely on them being keywords since future versions of sdfits may change them to be true columns.

Column TypeSorted ascending Description Notes
CTYPE4 'STOKES' The fourth axis of the data. This is the polarization axis.
BANDWID Double Total bandwidth (Hz) See notes on frequency axis
DURATION Double Clock time spent taking this data (seconds). Includes blanking time and may include time spent on other switching states. See note on times
EXPOSURE Double Actual time spent collecting data (seconds). This quantity should be used in the radiometer equation. See note on times
TSYS Double System temperature (K) 1.0, sdfits makes no attempt to determine this.
CRVAL1 Double The value of the first data axis at CRPIX1 (Hz) See notes on frequency axis
CRPIX1 Double The reference pixel number of the first data axis CRPIX = nchan/2 for spectral line backends(reminder, FITS pixels start at 1). This column is omitted for DCR data (which only has 1 pixel on the first axis).
CDELT1 Double The increment between pixels on the first data axis (Hz) See notes on frequency axis
CRVAL2 Double The value of the second data axis (Deg) This axis always has one pixel, CRPIX2 is assumed to be 1, see notes on pointing directions
CRVAL3 Double The value of the third data axis (Deg) This axis always has one pixel, CRPIX3 is assumed to be 1, see notes on pointing directions
VFRAME Double Radial velocity of the reference frame (m/s) This is copied from the LO1TBL table of the LO1 FITS file using the row with DMJD closest to but not after the mid-point (in time) of the integration being filled by sdfits. This column is not written for DCR data. This is a true (relativistic) velocity.
RVSYS Double Radial velocity, Vsource - Vtelescope (m/s) This is copied from the LO1TBL table of the LO1 FITS file using the row with DMJD closest to but not after the mid-point (in time) of the integration being filled by sdfits. This column is not written for DCR data. This is a true (relativistic) velocity.
OBSFREQ Double Observed (sky) center frequency (Hz) Same as CRVAL1
LST Double LST at midpoint of integration (s) Equivalent to the DATE-OBS time plus 1/2 of the DURATION, i.e. the midpoint in UT, converted to local sidereal time (LST).
AZIMUTH Double Azimuth (deg) see notes on pointing directions
ELEVATIO Double Elevation (deg) see notes on pointing directions
TAMBIENT Double Ambient temperature (K) The AMBTEMP keyword in the Antenna FITS file (in C) converted to K.
PRESSURE Double Ambient pressure (mmHg) The AMBPRESS keyword in the Antenna FITS file (in miliBars) converted to mmHg
HUMIDITY Double Relative humidity The AMBHUMID keyword in the Antenna FITS file.
SITELONG Double E. longitude of the intersection of the az/el axes (deg) The negative of the value of the SITELONG keyword in the Antenna FITS file (so that this value is East longitude).
SITELAT Double N. Latitude of the intersection of the az/el axes (deg) The value of the SITELAT keyword in the Antenna FITS file.
SITEELEV Double height of the intersection of the az/el axes (m) The value of SITELEV keyword in the Antenna FITS file.
RESTFREQ Double Rest frequency at band center (Hz) The RESTFREQ keyword from the GO FITS file. If that value is not found, this is simply 1/2 the total bandwidth. This value is not written for DCR data.
FREQRES Double Frequency resolution (Hz) For the Spectrometer, this is 1.21 time the channel spacing (abs(CRVAL1)). For the Spectral Processor, the multiplication factor depends on the value of the TAPER keyword. For "Box" taper, this is 0.89 * abs(CRVAL1), for "Halfbox" taper this is 2.78 * abs(CRVAL1) and for "Cosine" taper this is 2.0 * abs(CRVAL1). FREQRES is the value that should be used in the radiometer equation. FREQRES is not written for DCR data.
EQUINOX Double Equinox of selected coordinate reference frame Equatorial coordinates only. The EQUINOX value from the GO FITS file (0.0 if that keyword is not found).
TRGTLONG Double Target longitude in coord. ref. frame (deg) Copied from one of these GO file keyword values (depending on the value of COORDSYS): "RA", "GLON", "AZ", and "HA" (only one of these is ever present in a single GO file). Note that for FITSVER <= 2.5 and INSTRUMENT equal to "Turtle" the RA and HA keyword values in the GO file are incorrectly stored as hours, not degrees (sdfits makes the appropriate adjustment when using them). For early versions of the GO file (FITSVER missing or < 1.0), the "RAJ2000" or "MAJOR" keyword holds the target longitude.
TRGTLAT Double Target latitude in coordinate reference frame (deg) Copied from one of these GO file keyword values (depending on the value of COORDSYS): "DEC", "GLAT" and "EL" (only one of these is ever present in a single GO file). For early versions of the GO file (FITSVER missing or < 1.0), the "DECJ2000" or "MINOR" keyword holds the target latitude.
BEAMXOFF Double beam XEL offset The BEAMXELOFFSET value for FEED from the BEAM_OFFSETS table of the Antenna FITS file.
BEAMEOFF Double beam EL offset The BEAMELOFFSET value for FEED from the BEAM_OFFSETS table of the Antenna FITS file.
VELOCITY Double line velocity in rest frame (m/s) The VELOCITY keyword from the GO FITS file (0.0 if keyword not found). This column is not present for DCR data.
TCAL Float calibration temperature (K) See the receiver calibration notes
ZEROCHAN Float zero channel This column is not written for DCR data. See the Spectrometer DATA notes for a discussion on how this value is determined from the raw lags for sectrometer data. For the Spectral Processor, the value of this column is always NaN.
DATA Float (varies) The raw data See the notes for each supported backend: DCR, Spectral Processor, Spectrometer
GBT-specific
CRVAL4 Integer The value of the STOKES axis See the notes on polarization axis. This axis always has one pixel. CRPIX4 (which is intentionally not found in this sdfits file) is assumed to be 1.
FEED Integer (signal) feed number From the FEED column of the IF FITS file using the appropriate BANK and PORT. Note that for cross-correlation data sdfits does not yet support the case where data from one feed is correlated with data from a different feed. The first BANK and PORT in given SAMPLER is used to determine FEED. See the notes on frequency axis for a discussion on how BANK and PORT are determined for each backend.
SRFEED Integer reference feed number From the SRFEED1 or SRFEED2 columns of the IF FITS file using the appropriate BANK and PORT. The value used is the one that is not FEED. See the notes on frequency axis for a discussion on how BANK and PORT are determined for each backend.
SUBREF_STATE Integer State of subreflector when nodding. 1=first position, 0=moving, -1=second position. See notes.
PROCSEQN Integer scan sequence number PROCSEQN from the GO FITS file (0 if keyword not found).
PROCSIZE Integer number of scans in procedure PROCSIZE from the GO FITS file (0 if keyword not found).
LASTON Integer last 'on' for position switching LASTON from the GO FITS file (0 if keyword not found).
LASTOFF Integer last 'off' for position switching LASTOFF from the GO FITS file (0 if keyword not found).
SCAN Long Integer Scan number SCAN keyword from the GO file.
SDFITS Core
SDFITS DATA and axes
SDFITS Shared
SIDEBAND String (1) resulting sideband ('U'pper or 'L'ower) From the SIDEBAND column of the IF FITS table for the given SAMPLER. For cross-correlation data, the value from the first BANK and PORT is reported here. If SIDEBAND is missing, "U" is reported here.
SIG String (1) signal is "T", reference is "F" For the Spectrometer, this is "T" when the ACT_STATE table values of ISIGREF and ESIGREF are 0. For the Spectral Processor and the DCR, this is "T" when STATE table value of SIGREF is 0.
CAL String (1) cal ON is "T", cal OFF is "F" For the Spectrometer, this is "T" when the ACT_STATE table values of ICAL and ECAL are not 0. For the Spectral Processor and the DCR, this is "T" when the STATE table value of CAL is 0.
TDIMa String (16) The dimensions of DATA (column a, a is currently 7 but may change) '(nchan,1,1,1)' - 4 axes, only first axis has more than one element
FRONTEND String (16) Frontend device RECEIVER keyword from the receiver calibration file (TCAL) named in the ScanLog.fits file for this scan.
DATE-OBS String (22) Date and time of observation start See note on times
TIMESTAMP String (22) Approximate date and time of scan start (UTC) The is the prefix shared by the file names of all M&C FITS files written for a given scan. It should be close to, but not necessarily equal to, the scan start time as given in the DATE-OBS column. The format is "YYYY_MM_DD_HH:MM:SS".
Object String (32) Source name OBJECT keyword value from GO FITS
OBSERVER String (32) Name of observer(s) OBSERVER keyword from the GO file.
OBSID String (32) Observation description OBSID keyword from the GO file.
OBSMODE String (32) Observing mode "PROCNAME:SWSTATE:SWTCHSIG" where those values come from the keywords of the same name in the GO file.
CTYPE2 String (4) Type of second data axis Type of longitude-like pointing direction ("RA", "HA", "AZ", "GLON", "OLON", or "????"), see notes on pointing directions
CTYPE3 String (4) Type of third data axis Type of latitude-like pointing direction ("DEC", "EL", "GLAT", "OLAT", or "????"), see notes on pointing directions
TUNITa String (6) The units of DATA (column a, a is currently 7 but may change) Always 'counts' for raw data
CTYPE1 String (8) Type of first data axis Always 'FREQ-OBS'
VELDEF String (8) Velocity definition and frame See the VELDEF notes. This column is not written for DCR data.
RADESYS String (8) Equitorial coordinate system name This is copied directly from the RADESYS keyword in the GO file if the COORDSYS keyword found there is "RADEC". For any other COORDSYS this value is "". If COORDSYS is missing, RADESYS is set to "FK5". Very old GO FITS files use RADECSYS instead of RADESYS and no COORDSYS keyword is present. For those old GO FITS files, the RADECSYS value found there will be either "B1950" or "J2000". For "B1950", an RADESYS value of "FK4" is written by sdfits. For "J2000" an RADESYS value of "FK5" is written by sdfits. The same value of RADECSYS in the GO fits file also determines EQUINOX (1950.0 or 2000.0), CRTYPE2 ("RA"), and CRTYPE3 ("DEC").
SAMPLER String (8) Sampler description For an autocorrelation, this is simply the BANK string and the PORT string concatenated (e.g. "A2"). For a cross-correlation, this is the two BANK and PORTs going into the correlation separated by an "x", e.g. "A1xA2". See the notes on frequency axis for a discussion on how BANK and PORT are determined for each backend.
CALTYPE String (8) LOW or HIGH, may eventually be other types This is "HIGH" when the value of the HIGH_CAL column for the BANK and PORT appropriate for this row is 1, otherwise this value is "LOW". sdfits will warn the user if the value of HIGH_CAL is different for the two BANK and PORT pairs involved in a cross-correlation (the value stored here is always the value of the first BANK and PORT in a cross-correlation).
TELESCOP String keyword The telescope name TELESCOP keyword from ScanLog.fits
PROJID String keyword Project identifier PROJID keyword from the ScanLog.fits file.
BACKEND String keyword backend device 'Spectrometer' or 'Spectral Processor' or 'DCR'

The DATA array

DCR

The output SDFITS file has a single DATA value in each row of the binary table. The raw DCR data comes from the DATA column of the DATA table (the last table in the DCR FITS file). This is a 2-dimensional column and the CTYPE1 and CTYPE2 keyword values indicate what the two axes are. Currently, CTYPE1 is "STATE" and CTYPE2 is "RECEIVER". The switching state values (CAL and SIG/REF) change along the STATE axis and the sampler values (BANK and PORT) change along the RECEIVER axis. The specific switching state corresponding to each STATE axis pixel is described in the accompanying STATE table. Similarly, the specific receiver information is described in the accompanying RECEIVER table. How that information is used to construct the output SDFITS file is described elsewhere in this document (see e.g. frequency axis, times, and the CAL and SIG columns).

All of the data from the same STATE and RECEIVER element of the DATA column within one scan is written out to consecutive rows in the output FITS file (so time (DATE-OBS) varies fastest). This is done so that GBTIDL can easily pick out a time-sequence for each (STATE,RECEIVER) index and put that in to a single GBTIDL continuum data container. The specific STATE and sampler (RECEIVER) values should be determined from the other columns of the SDFITS file. There is no guarantee as to which of those two quantities will vary faster next. The data values are copied "as is" from the raw FITS file.

Spectral Processor

The output SDFITS file has a single spectrum (only the first axis has more than one element) in each row of the binary table. The raw Spectral Processor data comes from the DATA column of the DATA table (the last table in the Spectral Processor FITS file). This is a 3-dimensional column and the CTYPE1, CTYPE2, and CTYPE3 keywords indicate what the three axes are. Currently, CTYPE1 is "FREQUENCY", CTYPE2 is "STATE" and CTYPE3 is "RECEIVER". The switching state values (CAL and SIG/REF) change along the STATE axis and the sampler values (BANK and PORT) change along the RECEIVER axis. The specific switching state corresponding to each STATE axis pixel is described in the accompanying STATE table. Similarly, the specific receiver information is described in the accompanying RECEIVER table. How that information is used to construct the output SDFITS file is described elsewhere in this document (see e.g. frequency axis, times, and the CAL and SIG columns).

One row of the output SDFITS file consists of all of the elements along the FREQUENCY axis at a specific STATE and RECEIVER pixel in the raw DATA column. The data values are copied "as is" from the raw FITS file. Data are currently written so that time (DATE-OBS) varies slowest so that all of the raw data from a single row of the FITS file is written to the output SDFITS file before moving on to the next row in the raw FITS file. The specific STATE and sampler (RECEIVER) values should be determined from the other columns of the SDFITS file. There is no guarantee as to which of those two quantities varies fastest.

Spectrometer

The output SDFITS file has a single spectrum (only the first axis has more than one element) in each row of the binary table. The raw Spectrometer data is in the form of lags (time-domain, not frequency-domain) from the DATA column of the DATA table (the last table in the Spectrometer FITS file). This is a 3-dimensional column and the TDESC3 keyword (which gives the description of column 3, the DATA column) indicates which axis is which in the DATA array (similarly there is a TDESC2 keyword which describes the 2-dimensional INTEGRAT column and can be used to determine what the appropriate integration time is for a given sampler and switching state). Currently, TDESC3 is "LAG,SAMPLER,ACT_STATE", meaning that LAG varies along the first axis, SAMPLER (PORT and BANK) varies along the second axis and ACT_STATE (internal and external switching of cal and sig/ref signals) varies along the third axis.

The SAMPLER axis values (the BANK and PORT pairs of the two signals that comprise the correlation associated with each sampler) are described by the SAMPLER table (and the specific BANK and PORT information is given in the PORT table).

The ACT_STATE axis values are described by the ACT_STATE table. "ACT" is short for "actual" in that this describes the switching state information in the order it was actually written to the DATA column (set by the internal Spectrometer accumulators) as opposed to the STATE table which describes the temporal order in which the states were switched. The order of switching information in the ACT_STATE table is often not the same as in the STATE table so care must be taken when looking up information in associated FITS files that use the STATE table. Note also that it is possible to configure the system so that there are ACT_STATE rows that do not have any corresponding STATE rows - typically the data associated with those "extra" ACT_STATE rows will be all -1 or some other indication that no data was taken in those states. Those ACT_STATE rows exist because of the details of the internals of the Spectrometer but the lag values associated with those rows in ACT_STATE will by non-physical (-1) because that state never actually occurred while data was being recorded.

SAMPLER and ACT_STATE (and STATE) are used in filling other information in the output SDFITS file (see e.g. frequency axis, times, and the CAL and SIG columns).

One row of the output SDFITS file consists of the spectra corresponding to a single vector of LAG values (or two LAG vectors in the case of cross-correlation data, see below) at a specific ACT_STATE and SAMPLER pixel in the raw DATA column. Data are currently written so that time (DATE-OBS) varies slowest so that all of the raw data from a single row of the FITS file is written to the output SDFITS file before moving on to the next row in the raw FITS file. The specific ACT_STATE and SAMPLER values should be determined from the other columns of the SDFITS file. There is no guarantee as to which of those two quantities varies fastest.

The recorded lags are only the zero and positive lags for each correlation. The auto-correlations are symmetric about the zero-lag. The cross-correlations are not symmetric, but if XY holds the positive lag portion of the X by Y correlation then YX (the positive lag portion of the Y by X correlation) is the negative lag portion of the X by Y correlation. Robishaw and Heiles have produced a memo that gives a good description of how to turn the correlations (lags) recorded in the raw FITS files into spectra. The shortcomings involving sdfits have all been addressed (see these 3 MRs for details: 6C706, 2C706, and 3C706.

The steps sdfits takes in going from raw lags to raw (uncalibrated) spectra are (note that there is an additional optional step here to deal with possible bad lags that sometimes plague the Spectrometer):
  • For the 9-level data, they are first multiplied by 16 to properly scale them.
  • Apply the appropriate Van Vleck correction. The code that does this is derived from GBT memo 250. Although the code can take into account different sampler levels and DC offsets, the default behavior is to assume that there are no DC offsets and that the sampler levels are equally spaced.
    • For auto-correlation data, examine the zero lag to determine the first positive sampler threshold level. If the zero lag is <= 0 or > (nlevels-1)^2 where nlevels is the number of sampler levels (3 or 9) then that is unphysical and no threshold level can be determined. The entire output spectrum and ZEROCHAN value are set to IEEE not a number (NaN). This data is bad (it probably indicates no data were taken for that sampler and switching state) and should not be used.
    • Look for bad data. The Spectrometer suffers from occasional drop-outs in the values for some lags. There are two types of bad lags.
      • 1024-lag segments. An entire chip (1024 lag values) will report incorrect values. These are identified when the mean of the values from an entire chip is more than 6-sigma from the mean of the 512 lags on either side of the chip. For the last 1024 lags, only the mean of the preceding 512 lags are used in the comparison. For all known auto-correlation cases, when the first 1024 lags are bad the reported values are always non-physical and that data is set to NaN for that reason. For the cross-correlation lags, the mean of the last 512 lags are compared with the mean of the following 512 lags to determine if the first chip (first 1024 lags)are bad. When checking for a bad chip, the mean of the following 512 lags is also compared with the mean of the preceding 512 lags. If they differ by more than 6-sigma from each other, then the following chip is assumed to be bad (the preceding 512 lags are assumed to be either good or already fixed if the user has requested that since, if they are bad and they are not fixed, the searching algorithm simply stops and reports that the lags contain bad values).
      • Individual bad lags. These are only checked when the user has optionally asked sdfits to attempt to fix the bad lags (described next). Individual lags are judged to be bad when the value is more than 6-sigma from the mean of the surrounding 200 lags. This sigma multiplier of 6 can be optionally changed using the -sigmafactor option. As the lag being tested approaches the end of the correlation function, fewer than 200 lags are used in the comparison until at the last lag only the previous 100 lags are used in the comparison.
    • Optionally fix bad lags. If the -fixbadlags argument is used then, sdfits will attempt to adjust bad lags in some cases. This should be used with caution. It is useful when a substantial amount of data would otherwise be completely lost due to the bad lags. A log file is generated by this operation and it should be examined.
      • 1024-lag segments. If the first 1024 lags are bad or there are adjacent bad lag segments, then the bad lags are adjusted such that the mean of the bad segment is the same as the surrounding lags used in the initial comparison and the RMS of the bad lags about that mean is the same as that of the lags used in the comparison. If there are adjacent 1024-lag segment, then that data is not fixed.
      • Individual bad lags. These are only identified when -fixbadlags has been used. The first several lags are never checked because the correlation function has a lot of real structure there. The number of lags to skip at the beginning of the correlation function can be set by the user using the -spikestart option but the default value of 200 seems to work well. If two (more more) adjacent lags are identified as bad by this algorithm, then no attempt will be made to fix them. Individual bad lags are fixed by replacing their value with the value of the mean of the surrounding 100 lags.
    • Set bad data to NaN. If there remain unfixed 1024-lag segments, then the entire set of lags is replaced by NaN and written to disk by sdfits for the associated output row(s) (note that this may result in two rows being bad since a single cross-correlation lag vector contributes to two output sdfits rows). Unfixed 1024-lag segments will exist if the user has either elected to not fix the bad lags or the bad lags could not be fixed (adjacent 1024-lag segments). Individual bad lags never trigger this step.
    • Apply the Van Vleck correction. For the auto-correlation data, the previously determined sampler level is used to set the correction function. For cross-correlation data, the sampler level for the two bank+port combinations involved in the correlation are used to set the correction function. The function is applied to the data based on the observed lag values possibly corrected for bad lags as described above.
    • Rescale the corrected values to the appropriate power levels. The corrected values are scaled by
       (optimum_threshold_level / sqrt(x_threshold * y_threshold)) 
      where the x and y threshold levels are the two threshold levels used in constructing the Van Vleck correction function (the x and y thresholds are the same value for the auto-correlation case) and the optimum threshold level is 0.612003181 for 3-level sampling and 0.266911104 for 9-level sampling.
  • Construct the full correlation function from the positive-lag partial functions available. Because Fourier transforms are faster when the number of values is a power of 2, an extra lag value is invented - the Nyquist value. The FFT used in the Python code inside sdfits expects the origin (zero lag) to be the first value, followed by the positive lag values and finally followed by the negative lag values in reverse order, i.e. lag[0], lag[1], ..., lag[n], lag[Nyquist], lag[-n], lag[-(n-1)], ..., lag[-1].
    • Auto-correlation. The full function is symmetric about the zero lag: lag[-m] == lag[m]. The Nyquist value is a simple copy of the lag[n] value.
    • Cross-correlation. There are two "SAMPLER"s that must be used together. If one set of lags is the correlation between, for example A1 and A2 (A1xA2), then the paired "SAMPLER" is A2xA1. A1xA2 contain the positive lags and A2xA1 is equivalent to the negative lags that would have been measured for A1xA2. The zero lags should be the same and sdfits uses the average of the two zero lags just in case there are any significant differences (in a limited check, none were seen). The Nyquist value is a simple average of the two lag[n] values.
  • FFT the full correlation function and extract the spectra from the positive and negative frequency values. The FFT produces 2*Nlag values since there are 2*Nlag values going in.
    • For the auto-correlation case, the FFT is real since the correlation function is symmetric. The negative and positive frequency values are the same. The zero-frequency (origin) value is put in the ZEROCHAN column and 2 * the positive frequency values are assigned to the DATA column in the output sdfits (that is equivalent to summing, not averaging, the positive and negative frequency values - see Robishaw and Heiles).
    • For the cross-correlation case, the FFT is imaginary. The real part is assigned to the row in the output sdfits associated with the source of the positive lags (A1xA2 in the above example) while the imaginary part is assigned to the row corresponding to the source of the negative lags (A2xA1 in the above example). Again, the zero-frequency (origin) value is placed in the ZEROCHAN column and 2 * the positive frequency values are placed in the DATA column.

  • The Sky Frequency Formula:
             sky = SFF_SIDEBAND*IF + SFF_MULTIPLIER*LO1 + SFF_OFFSET 
          
    where the SFF_* values come from a row in the IF Manager FITS file, LO1 comes from the values in the appropriate Tracking Local Oscillator FITS file, and IF depends on the backend as described below.
  • Using the IF Manager and LO1 FITS files.
    • Indexing to the appropriate row in the IF Manager FITS file is done using the BANK and PORT columns for the BACKEND in question.
      • Spectrometer: BANK and PORT values come from the SAMPLER table in the Spectrometer FITS file. The DATA array has a SAMPLER axis and each pixel on that axis corresponds to one row in the SAMPLER table (using the pixel number as the row number). Each row has two BANK and PORT pairs that are correlated with each other ("BANK_A" and "PORT_A" with "BANK_B" and "PORT_B"). In the auto-correlation case the two banks are the same and the two ports are the same. In the cross-correlation case, the two pairs will differ and so one SAMPLER points at two different rows in the IF table. The sdfits program assumes that the frequency information and feed information is the same between the two rows (the "A" BANK and PORT are used) but the polarization may differ (see the notes on polarization).
      • SpectralProcessor: The Spectral Processor FITS file format predates the IF Manager BANK and PORT indexing design. The DATA array has a RECEIVER axis and each RECEIVER corresponds to one BANK and PORT combination. There are always an even number of RECEIVER rows. The first half of the rows are from BANK "A" and the second half are from BANK "B". Within each BANK, PORTs are numbered sequentially from 1 to n where n is 1/2 the total number of RECEIVERS. For example, if there are 4 RECEIVERs then the BANK+PORT combinations are A1, A2, B1, and B2 for the 4 RECIEVERs in the same order as found in the RECEIVER DATA axis.
      • DCR: The BANK value comes from the INPBNK keyword in the primary header of the DCR FITS file (it will be either "A" or "B"). The DATA array has a RECEIVER axis and each element therein corresponds to a row in the RECEIVER table. The PORT value for each row is the CHANNELID value for that row plus 1.
    • The LO1 value comes from the tracking LO FITS file (this is indicated by the value of the LO_CIRCUIT column in the IF FITS file, usually "LO1A"). There are two locations in that file that together give the LO1 to use in this equation.
      • LO1FREQ: The LO1FREQ column in the LO1TBL table gives the LO1 value at specific times (the DMJD column in the same table). sdfits always chooses the LO1FREQ nearest to and before the mid-point (in time) of the integration being filled.
      • FREQOFF: The FREQOFF column in the STATE table gives any switching-state dependence in LO1. The two values, added together, give the LO1 to be used in the sky frequency formula. The IF value to use depends on the backend and is described in the next item. For the DATA in question, the STATE is first determined using the STATE axis and STATE table local to that backend FITS file. Then the corresponding row in the STATE table of the LO1 FITS file is identified and the FREQOFF value for that state is determined. Note that for the Spectrometer, the DATA array has an ACT_STATE axis and the appropriate row in the ACT_STATE table must be matched with the equivalent row in the STATE table since the order of switching states in each table may not be the same. Note also that the Spectrometer may have external or internal signals as indicated in the ACT_STATE table.
  • BANDWID:
    • DCR and SpectralProcessor use the value of the BANDWDTH column in the IF FITS file for the appropriate bank and port (receiver).
    • Spectrometer uses the value of the BANDWDTH column in the PORT table for the appropriate PORT (as determined by the SAMPLER axis and SAMPLER table).
  • CRVAL1: Uses the sky frequency formula above and an appropriate IF (the IF at the center channel (CRPIX1) or the center of the bandpass for DCR data) and LO1 as described above.
    • DCR: The CENTER_IF from the IF Manager FITS file gives the IF at the center of the bandpass.
    • SpectralProcessor: The CENTER_IF gives the IF at CRPIX1 (nchan/2, where nchan is the number of channels).
    • Spectrometer: The center IF is equal to
            (FSTART + sign * (BANDWIDTH * center_channel/nlags))
            
      where FSTART and BANDWIDTH come from the PORT table from the row appropriate to the port and bank of interest, nlags is the number of lags (also equal to the number of channels), center_channel is CRPIX1 (nlags/2) and sign is +1 for the 12.5 and 200 MHz bandwidths and -1 for the 50 and 800 MHz bandwidths.
  • CDELT1: The frequency interval between adjacent channels.
    • DCR: This column is omitted. There is only one frequency pixel.
    • SpectralProcessor and Spectrometer: BANDWID / nchan
  • The frequency at a given pixel (channel):
          f(i) = (i-CRPIX1)*CDELT1 + CRVAL1
          
    Note: FITS channels are always numbered starting with 1.

Times

  • DURATION: The clock time spent on that specific state.
    • DCR: The following keywords and columns in the DCR FITS file are used in determining this value for use in the SDFITS table:
      • DURATION keyword from primary header. That is the duration of one integration, in seconds.
      • CYCLES keyword from the primary header. That is the number of switching cycles in each integration.
      • BLANKTIM and PHASETIM values from the STATE table in the same DCR FITS file. The sum of those two values is the time spent on that switching state in one cycle. In calculating the SDFITS DURATION for a specific state, the sum of BLANKTIM and PHASETIM is multiplied by CYCLES. There is some additional time not recorded in this FITS file anywhere (the sum of (BLANKTIM + PHASETIM) * CYCLES over all states is less than the length of one integration (the DCR DURATION keyword). The time in each state is adjusted so that the total time matches the DCR DURATION keyword value and the fraction of the time spent in each state remains equal to (BLANKTIM + PHASETIM) for that state divided by the total of (BLANKTIM + PHASETIM) for all states. The adjusted state durations are stored in the appropriate rows in the SDFITS DURATION column.
    • SpectralProcessor: This is nearly identical to the method used for the DCR described above. The only difference is that there is no CYCLES keyword and the BLANKTIM and PHASETIM values are assumed to be the totals for all switching cycles in an integration. The BLANKTIM and PHASETIM have separate values for each "receiver" element (sampler) but these are assumed to all be the same and only the first value is actually used. The total integration duration comes from the INTTIME keyword found in the DATA table of the Spectral Processor FITS file.
    • Spectrometer: The PHSESTRT column of the STATE table is used to find the starting fraction for each state. The state's duration, expressed as a fraction of the whole, is then derived from those starting fractions. The total time spent on an integration is the product of the SWPERIOD keyword in the STATE table and the SWPERINT from the DATA table of the Spectrometer FITS file. Each state's DURATION then comes from the product of the integration's duration and the fraction of time spent on that state.
  • EXPOSURE The time actually spent collecting data (seconds) for that specific state.
    • DCR: the state DURATION minus the state BLANKTIM This seems to not take into account CYCLES. Also, why not just use PHASETIM*CYCLES?
    • SpectralProcessor: The state durations are used as-is. Why not use the PHASETIM there?
    • Spectrometer: Uses the INTEGRAT column of the DATA table. This is a 2-dimensional array where the first dimension is sampler and the second dimension is state. There can be a different integration time for each sampler, state, and integration. In practice, the integration times for all samplers at the same state and integration should be the same. Prior to FITSVER 2.3 in the Spectrometer FITS file this 2-dimensional array was written incorrectly. Instead of being written in FITS order, where the first axis varies the fastest, it was written with the second axis varying the fastest. The values are correct, but care must be taken when extracting them so that they are assigned to the proper state and sampler for each integration.
  • DATE-OBS: The start time of that integration. (Note that an individual switching state may start slightly later depending on the switching details; DATE-OBS is the same for all samplers and switching states in a given integration.)
    • DCR: Copied directly from the TIMETAG column in the DATA table (MJD) and converted to a string.
    • SpectralProcessor: The UTDATE keyword plus the UTCSTART column value (MJD); converted to a string.
    • Spectrometer: The DMJD column (MJD) converted to a string.

Pointing Directions

The pointing direction for the row is recorded in the 2nd and 3rd DATA axes as well as in the AZIMUTH and ELEVATIO columns. These values all come from the Antenna FITS file.
  • Pointing axis types (CTYPE2 and CTYPE3). These come from the value of the COORDSYS keyword in the GO FITS file according to the following translation:
    COORDSYS CTYPE2 CTYPE3
    GALACTIC GLON GLAT
    RADEC RA DEC
    HADEC HA DEC
    AZEL AZ EL
    OTHER OLON OLAT
    Any value not recognized is treated as OTHER. If COORDSYS is not found, then the CTYPE values will be "????".
  • Integration averaged pointing directions. The pointing direction values are average values over the integration recorded in that row. Using the integration midpoint (DATE-OBS + DURATION/2) and the DURATION, all of the rows in the antenna FITS file recorded during that integration are selected as well as the row immediately before the start of the integration and the row immediately after the end of the integration (if not already selected). The two samples that straddle the integration start time are used to get an interpolated antenna position at the start of the scan. Similarly, the two samples that straddle the integration end time are interpolated to get an antenna position at the end of the scan. These two interpolated positions plus any samples that occurred during the integration are averaged together (weighted appropriately at the end points depending on how close those end points are to an actual antenna position sample) to get the 4 averaged antenna positions for this integration: MAJOR, MINOR, MNT_AZ, and MNT_EL.
  • CRPIX2 is the integration averaged value of the MAJOR column from the Antenna FITS file.
  • CRPIX3 is the integration averaged value of the MINOR column from the Antenna FITS file.
  • AZIMUTH is the integration averaged value of the MNT_AZ column from the Antenna FITS file.
  • ELEVATIO is the integration averaged value of the MNT_EL column from the Antenna FITS file.
  • Note these positions are the tracking pointing directions. They do not take into account any beam offsets that may be appropriate for the data in each row.

Subreflector Nodding

If the user has chosen to move the subreflector's tilt during a scan using the "SubNod" motion (the SUBMOTIN keyword in the GO FITS file will have that value) then sdfits characterizes the subreflector's tilt in the SUBREF_STATE column. For all other scans, SUBREF_STATE has a value of 1.

SUBREF_STATE can have one of these three values:
  • 1 - the subreflector tilt was at the first commanded position for all values of the measured tilts taken during that integration.
  • -1 - the subreflector tilt was at the second commanded position for all values of the measured tilts taken during that integration.
  • 0 - the subreflector was at neither commanded position for at least one of the measured tilts taken during that integration - i.e. the subreflector was moving between positions.

The algorithm for determining if a measured tilt is at either of the two commanded positions is as follows (note that this is only done if SUBMOTIN is "SubNod" for that scan).
  • The commanded and measured Y tilts are not used. The Y tilt is a rotation of the subreflector and it should be zero although that assumption is never checked.
  • The commanded X and Z tilts for the two commanded positions are read from the GO FITS file: SUBXT1, SUBXT2, SUBZT1, SUBZT2
  • Four additional keywords related to focus tracking are read from the Antenna FITS file: FTRKXTA, FTRKZTA, LFC_XT, LFC_XT.
  • The four commanded tilts (one at each position in the X and Z directions) are:
      pos1xt = SUBXT1 + FTRKXTA + LFC_XT
      pos2xt = SUBXT2 + FTRKXTA + LFC_XT
      pos1zt = SUBZT1 + FTRKZTA + LFC_ZT
      pos2zt = SUBZT2 + FTRKZTA + LFC_ZT
  • The measured plate tilts are read from the Antenna FITS file: SR_XT and SR_ZT.
  • All of those tilts are in degrees. They are converted to arcminutes on the sky by multiplying by the plate scale, which is different for the two axes: plateScaleXT = -8.33 and plateScaleZT = -10.405
  • The commanded plate tilts are subtracted from the measured plate tilts in units of arcminutes.
      tilt1Sqrd = ((SR_XT-pos1xt)*plateScaleXT)**2 + ((SR_ZT-pos1zt)*plateScaleZT)**2
      tilt2Sqrd = ((SR_XT-pos2xt)*plateScaleXT)**2 + ((SR_ZT-pos2zt)*plateScaleZT)**2
  • The criteria for determining if tilt1Sqrt or tilt2Sqrt are close enough to zero to count that sample as being at that position (1 or 2) is that the fractional gain loss is no more than 5 percent. For the GBT, this is roughly:
       criteriaSqrd = -0.361 * ln(0.95) * (12.3/FreqGHz)**2
    • FreqGHz is the center frequency, in GHz, taken from the CENTER_SKY column of the IF Manager FITS File using the row associated with the appropriate SAMPLER value.
    • if tilt1Sqrt is less than or equal to criteriaSqrd then that sampler is labeled as being at position 1
    • if tilt2Sqrt is less than or equal to criteriaSqrd then that sampler is labeled as being at position 2
    • otherwise the sampler is labeled as being moving
  • All of the comparisons with criteriaSqrd for all of the samples taken during the integration of interest are examined. If any of them are "moving" then the entire integration is marked as "moving" ( SUBREF_STATE = 0). Otherwise they are marked with whatever label is associated with the first sample (position 1 is SUBREF_STATE = 1 and position 2 is SUBREF_STATE = -1) since the subreflector never goes directly from position 1 to position 2 without passing through the "moving" state).

Polarization to STOKES axis enumeration conversion

A two-character string describing the polarization associated with the data in a given row is constructed from the BANK and PORT values for that data (see the notes on frequency axis for a discussion on how BANK and PORT are determined for each backend). For DCR data and SpectralProcessor data, sdfits assumes that the values come from a single SAMPLER (the BANK and PORT) and the two-character string is simply a doubling of the POLARIZE value from the corresponding row in the IF table (e.g. "R" becomes "RR"). For Spectrometer data the SAMPLER table describes two BANK and PORT pairs that have been correlated ("BANK_A" and "PORT_A" with "BANK_B" and "PORT B") for each SAMPLER. The first character in the polarization string comes from the POLARIZE value in the row in the IF table pointed to by the "A" (BANK, PORT) pair and the second character comes from the POLARIZE value in the row in the IF table pointed to by the "B" pair (in the auto-correlation case both pairs will point at the same row and the polarization string will be doubled).

The resulting string is then translated into an integer and assigned to CRVAL4 according to this table. For completeness, this table includes polarizations not produced by sdfits.

Polarization STOKES
RR -1
LL -2
RL -3
LR -4
XX -5
YY -6
XY -7
YX -8
I 1
Q 2
U 3
V 4

Receiver Calibration Temperature (TCAL)

The TCAL value is a scalar value obtained from the Receiver Calibration Measurements FITS file (receiver FITS file). Each receiver FITS file has several RX_CAL_INFO tables which record the lab-measured TCAL values for each receptor (feed) and polarization at several frequencies. The appropriate one of these tables for the given sampler is located.
  • DCR: The CRVAL1 (center sky frequency) and BANDWID are used to find the measured TCAL values over the bandwidth. These are then averaged (taking into account the distance between samples and locations of the edge of the bandpass with respect to the samples) to arrive at a scalar value for that sampler. All integrations within the scan for the same sampler will have the same TCAL value.
  • SpectralProcessor and Spectrometer: The lab-measured TCAL values are interpolated on to the vector of sky frequencies for this sampler at the beginning of the scan (first integration, first state). Then, the inner 80% of these values are averaged (excluding 10% of the total number of channels at each end of the interpolated TCAL vector) to arrive at a single scalar TCAL value. This average over 80% of the channels is done to correspond to the similar averaging done during calibration as described in the calibration documentation.

VELDEF

The VELDEF is composed of two 4-character strings. The first 4 characters describes the velocity definition (RADI for radio, OPTI for optical and RELA for relativistic velocity definitions). The last 4 characters consist of a hyphen and a 3-character tag indication the velocity reference frame in use.

The value written by sdfits comes from the VELDEF keyword in the SOUVEL table of the LO1 FITS file. The following translations are made to the first 4 characters so that this value can follow the SDFITS convention: VOPT is translated to OPTI, VRAD is translated to RADI, and VELO is translated to RELA. See the LO FITS file documentation for more information on the available reference frames and velocity definitions. The final 4 characters are unaltered by sdfits.

This value defaults to "RADI-OBS" if VELDEF can not be found in the SOUVEL table of the LO1 FITS file.

This column is not written for DCR data.

-- BobGarwood - 27 Jun 2007
Topic revision: r24 - 2017-09-12, BobGarwood
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback