Implementing turboSETI in Near-Real-Time at the GBT


In the control room for the GBT, there are currently 64 BL compute nodes in use for data collection and processing. Typically 8 compute nodes are run at a time during an observing session, and each compute node gathers data from a different chunk of the spectrum, meaning that the data products from each compute node must be spliced together to form a continuous spectrum after processing. The first part of the project I am proposing here is to implement turboSETI as a part of the data reduction pipeline at the GBT so that the SETI candidate information that turboSETI provides will immediately be stored alongside the spectral products which are already saved as part of the Breakthrough Listen pipeline. This will require implementing turboSETI to search through data in each of the compute nodes directly after the observation, and then to write a program to splice the turboSETI data products from each of the compute nodes together after the initial processing is completed. This project will streamline and speed up the process of searching for technosignatures in the BL pipeline, and has the potential to ensure that possible detections of narrowband, Doppler-drifting technosignatures are identified by scientists nearer the time of observation, which could improve the likelihood of a successful follow-up observation of a promising SETI candidate.

To Do's

We are primarily using Trello to track to-do items. However, here's a list for quick reference:
  • Compare output from running turboSETI on single nodes to running on spliced files
  • Write program to splice UFUDs
  • Deploy as part of post-reduction scripts -- needs to not run during observing
  • Analyze output

Terminology Notes

  • spliced-filterbank dat (SFD) - output from turboSETI run on the spliced filterbanks
  • unspliced-filterbank unspliced-dats (UFUD) - output from turboSETI run on the unspliced filterbanks
  • unspliced-filterbank spliced-dat (UFSD) - UFUDs combined with the script that you'll be writing

Above definitions courtesy Steve Croft.

Software / Data Notes

Prior to starting analysis on our test data, I worked through this tutorial by Elan Lavie on my home laptop to get familiar with turboSETI and its associated tools:

I then installed turboSETI version on the blpc1 machine in my own Anaconda environment (ewhite).

For the near-real-time turboSETI project, we wanted to look at some test data and compare the resulting SFDs with the results of combining UFUDs. We decided to use the Voyager 2020 X-Band data (located on the BL cluster), and performed the following steps to generate the files we need:

To create the UFUDs...
  • Ran rawspec ( on the .raw files for each compute node to create .fil (filterbank) files -- one for each of the 3 data products (high freq., high time, and mid resolution) for each of the 6 scans in the ABACAD cadence for each node.
  • Ran turboSETI on each high-res .fil file to produce a separate .dat for each of the 6 scans in the cadence for each node (i.e., generated UFUDs).

To create the SFDs...
  • Ran splice2 to create the 6 spliced filterbank files (one for each scan in the cadence) from the single-node filterbank files created in the last step by running rawspec.
  • Ran turboSETI on the 6 spliced filterbanks to create the SFDs

Comparing Histograms

After creating UFUDs and SFDs via a process which will be described in the above section, I created some histograms to attempt to compare the files' contents (using the iPython notebook plotting_ufuds.ipynb in this repository: A few brief notes on what I found:

  • The histograms' data is from the first scan in the 6-scan cadence. I have the .dat files for the remaining 5 scans in the cadence on blpc1 and can create separate plots for them as well if needed.
  • The UFUD plots are created by plotting all data from the UFUDs on one diagram. For details of how this is done, inspect the first cell of the notebook mentioned above. The SFD plots are created by plotting the results from the spliced filterbank in one screen as predicted. The plots' number of bins is equal to the number of data points that came out of the respective files.
  • There seem to be more data points for the plots composed of the data from the UFUDs (3300 rows) than for the spliced .dats (1408 rows). Not sure why this is or how this will affect things.
  • One test I tried in an attempt to make sure there were no overlapping regions in the individual UFUDs was that I created an array of frequencies in order, then subtracted each frequency entry from each other. No negative values were returned, which seems to indicate there is no frequency overlap.

I've pasted in the histogram plots below; UFUDs are on the left, SFDs on the right (note that you can see more zoomed-in versions by clicking on the files in the table at the bottom of the page).

Histograms -- No Binning

ufuds freq hist.png sfds freq hist.png

ufuds snr hist.png sfds snr hist.png

ufuds hits hist.png sfds hits hist.png

ufuds drifts hist.png sfds drift hist.png

Histograms -- 500 Bins

ufuds freq hist500.png sfds freq hist500.png

Note the x-axis of the SNR plots should be labelled "log10(SNR)"; I'll correct this later.

ufuds snr hist500.png sfds snr hist500.png

ufuds hits hist500.png sfds hits hist500.png

ufuds drifts hist500.png sfds drift hist500.png

Histograms -- 100 Bins

ufuds freq hist100.png sfds freq hist100.png

ufuds snr hist100.png sfds snr hist100.png

ufuds hits hist100.png sfds hits hist100.png

ufuds drifts hist100.png sfds drift hist100.png

-- EllieWhite - 2021-01-24
Topic attachments
I Attachment Action Size Date Who Comment
sfds_drift_hist.pngpng sfds_drift_hist.png manage 58 K 2021-01-25 - 11:03 EllieWhite  
sfds_drift_hist100.pngpng sfds_drift_hist100.png manage 64 K 2021-01-25 - 11:22 EllieWhite  
sfds_drift_hist500.pngpng sfds_drift_hist500.png manage 58 K 2021-01-25 - 11:09 EllieWhite  
sfds_freq_hist.pngpng sfds_freq_hist.png manage 50 K 2021-01-24 - 14:35 EllieWhite  
sfds_freq_hist100.pngpng sfds_freq_hist100.png manage 52 K 2021-01-25 - 11:23 EllieWhite  
sfds_freq_hist500.pngpng sfds_freq_hist500.png manage 52 K 2021-01-25 - 11:09 EllieWhite  
sfds_hits_hist.pngpng sfds_hits_hist.png manage 60 K 2021-01-25 - 11:04 EllieWhite  
sfds_hits_hist100.pngpng sfds_hits_hist100.png manage 63 K 2021-01-25 - 11:22 EllieWhite  
sfds_hits_hist500.pngpng sfds_hits_hist500.png manage 60 K 2021-01-25 - 11:08 EllieWhite  
sfds_snr_hist.pngpng sfds_snr_hist.png manage 60 K 2021-01-25 - 11:04 EllieWhite  
sfds_snr_hist100.pngpng sfds_snr_hist100.png manage 60 K 2021-01-25 - 11:22 EllieWhite  
sfds_snr_hist500.pngpng sfds_snr_hist500.png manage 54 K 2021-01-25 - 11:08 EllieWhite  
ufuds_drifts_hist.pngpng ufuds_drifts_hist.png manage 58 K 2021-01-24 - 14:37 EllieWhite  
ufuds_drifts_hist100.pngpng ufuds_drifts_hist100.png manage 64 K 2021-01-25 - 11:23 EllieWhite  
ufuds_drifts_hist500.pngpng ufuds_drifts_hist500.png manage 59 K 2021-01-25 - 11:10 EllieWhite  
ufuds_freq_hist.pngpng ufuds_freq_hist.png manage 57 K 2021-01-24 - 14:34 EllieWhite  
ufuds_freq_hist100.pngpng ufuds_freq_hist100.png manage 51 K 2021-01-25 - 11:24 EllieWhite  
ufuds_freq_hist500.pngpng ufuds_freq_hist500.png manage 52 K 2021-01-25 - 11:10 EllieWhite  
ufuds_hits_hist.pngpng ufuds_hits_hist.png manage 47 K 2021-01-24 - 14:35 EllieWhite  
ufuds_hits_hist100.pngpng ufuds_hits_hist100.png manage 58 K 2021-01-25 - 11:23 EllieWhite  
ufuds_hits_hist500.pngpng ufuds_hits_hist500.png manage 56 K 2021-01-25 - 11:09 EllieWhite  
ufuds_snr_hist.pngpng ufuds_snr_hist.png manage 58 K 2021-01-24 - 14:35 EllieWhite  
ufuds_snr_hist100.pngpng ufuds_snr_hist100.png manage 59 K 2021-01-25 - 11:23 EllieWhite  
ufuds_snr_hist500.pngpng ufuds_snr_hist500.png manage 54 K 2021-01-25 - 11:09 EllieWhite  
Topic revision: r2 - 2021-01-25, EllieWhite
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback