TGBT15A_915_127 - Matrix Adapter and Other Tests

Goals

  • Test the Matrix adapter that should allow for automatic switching between Matrix spectral line modes and Classic pulsar modes
  • Do A/B tests with out/with the guppi_daq server agent running on the HPCs to see if this triggers the incorrect MAC addresses on vegas-hpc3 and vegas-hpc4
  • Collect some new RAW mode data for Natalia
  • If time, look into c1500 packet loss issues
  • If time, do more optimization of incoherent mode levels

Details

  • Session begins 09:45
  • Joe begins switching versions to regression test candidate
  • Will begin by switching between spectral line and various pulsar modes to test Matrix adapter. No guppi_daq server agent will be running at this time, and GUPPI will not be used, so this will double as the "A" test for investigating why the bee2 MAC address is being loaded on to Banks C and D.
  • Software switch completed at 10:00
  • Will first run a simple Mode 1 spectral line scan using CheckMode1 SB
    • SB submitted at 10:06
      • Scan is #4
      • Balance looks good
      • Scan ran normally
      • Data look normal
  • Will check some other spectral line modes
    • Checking Mode 4 (SB submitted at 10:08)
      • Scan is #5
      • Balance looks OK
      • Data look OK
    • Checking Mode 10 (SB submitted 10:11)
      • Scan is #6
      • All looks normal
    • Checking mode 20 (SB submitted 10:13)
      • Scan is #7
      • There were no errors about the balancing but the levels are a bit high (3 and 5 dB on pol 1/2, respectfully)
      • Ran a second time (scan #8). Levels look good this time.
      • Data look OK
  • Will now check a pulsar mode.
    • Using IncoherentModeTests to run i0800x0512, VPM only
      • SB submitted at 10:20
      • Scan is #9
      • Configuration seems to have worked
      • Balance looks good
      • CLEO has the correct mode name
      • Scan started and ran without issue
      • Data look normal
  • Will try going back to an HBW spectral line mode
    • Running CheckMode1 SB (submitted at 10:24)
    • Scan is #10
    • All looks good
  • Going back to i0800x0512 w/ IncoherentModeTests SB (submitted at 10:25)
    • Scan is #11
    • Scan ran OK
    • Did notice that CLEO message reports a VEGAS error: "Test Message (please ignore)". Does not seem to cause any problems. Joe says this is a bug.
    • Cal signal does not look correct. Balancing and levels look OK but the cal square wave is not there.
      • Everything configured properly. LED on SSDS is flickering at the correct 25 Hz rate. No dropped data reported.
      • Re-running scan
    • Resubmitting at 10:35 (scan is #12)
      • Looks better now
      • Will go back to Mode 1 to see if we can recreate cal issue
  • Submitting CheckMode1 SB at 10:38
    • Scan is #13
    • Looks normal
  • Going back to i0800x0512
    • Scan is #14
    • SB submitted at 10:40
    • Configure, balance, look good. Scan is running without issue.
    • Data look OK this time
  • Checking Mode 10 (SB submitted at 10:44)
    • Scan is #15
    • All looks good
  • Going back to i0800x0512 (SB submitted at 10:45)
    • Scan is #16
    • All looks good
  • Checking Mode 20 (SB submitted at 10:48)
    • Scan is #17
    • Looks good
  • Going back to i0800v0512 (SB submitted at 10:50)
    • Scan is #18
    • Looks good
  • The next several scans will switch between various spectral line modes and pulsar modes, but this time we will use GUPPI as well as VPM
    • Scan #19 is Mode 1 (Submitted at 10:57)
      • Scan aborted: "BankCMgr is hung in Aborting"
      • Shared memory on Bank A is blank
      • Off/On cycled Bank A manager
      • Resubmitted at 11:10. Scan is now #20
      • Aborted again, same situation as before
      • Joe is troubleshooting
        • It appears that the adapter is not going into a monitor mode correctly when switching.
        • Joe fixed for now and will look into it more.
      • Resubmitted again at 11:33. Scan is now #21
        • Looks OK this time.
    • Scan #22 is i0800x0512 VPM/GUPPI (Submitted at 11:34)
      • Neither GUPPI or VEGAS seems to have started.
      • Everything seems to have configured OK.
      • Joe restarted turtle.
      • Astrid printed the following warning:
        [15:34:49] Configuring telescope.
        [15:35:31] Warning: nchan and numchan are set differently.Using numchan = 512
        [15:35:31] Warning: (Data generated using this configuration can NOT be reduced using GBTIDL. If GBTIDL is required, change the integration time(s) to a integral number of switch periods.
      • Resubmitting at 11:39
        • Running OK now. All looks good.
    • Scan #23 is Mode 10 (Submitted at 11:44)
      • Looks good
    • Scan #24 is i0800x0512 VPM/GUPPI (Submitted at 11:45)
      • Looks good
    • Scan #25 is Mode 20 (Submitted at 11:48)
      • Looks good
    • Scan #26 is i0800x0512 VPM/GUPPI (Submitted at 11:51)
  • Going to get lunch
  • Back at 12:09
  • Now we'll try to the coherent modes. We will try only VPM to see if there is any issue with the MAC addresses. First, let's stick in pulsar mode.
  • Using CoherentModeTests SB, VPM only, c0800x0512
    • SB submitted at 12:11
    • Aborted immediately. Banks B and D hung in aborting. Shared memory did not seem to configure properly on any banks except A.
    • Will cycle VEGAS Off/On and try again.
      • After power cycling all banks except A issued a warning "Valon frequency 1500 MHz is not equal to that set"
    • Resubmitted SB at 12:17
      • Valon messages cleared
      • Configuration successful this time
      • Bank A is taking a very long time to finish activating. It does eventually.
      • Other banks go into running before Bank A
      • Bank A never seemed to finish activating.
      • Scan aborted. Bank A in fault: "Aborting: failed to start the VEGAS HPC subprocess". All other banks in aborting: "Abort due to scan terminating too early -- Try cycling Vegas Off/On"
      • daq pulse on Bank A is not active
      • Will try to Off/On cycle again
        • Still no daq pulse
        • Joe troubleshooting
    • Resubmitting at 12:35
      • Aborted immediately. "BankBMgr is hung in Aborting" and "Aborting: failed to start the VEGAS HPC subprocess" from Banks B -- H.
      • Joe restarting all managers
    • Resubmitting again at 12:42
      • Scan is #30
      • Running OK this time
      • Data look OK
    • Going to try going back and forth between incoherent and coherent pulsar to see if we can recreate the last few issues.
    • Trying c0800x0512, VPM only again
      • Aborted again. This time "BankEMgr is hung in Aborting" and the typical "failed to start HPC subprocess" faults on banks B -- H
      • The daq pulse stops when configuring for incoherent mode but never comes back on banks B -- H when reconfiguring for coherent mode * Will try to configure incoherent mode w/ 8 banks (all at same RF) at Joe's request, and then go to coherent mode. * Joe doing a stop/start * SB submitted at 12:56 * Scan is #33 * Data look good for all 8 banks * Now configuring for coherent mode as previously done
      • Scan is #34
      • SB submitted at 12:59
      • All looks good
      • Joe wants to manually turn off Bank G, turn it back on, and then try coherent mode again
      • SB resubmitted at 13:02
        • guppi_daq did not come back on Bank G, aborted as expected.
        • So it seems like guppi_daq is not coming back to life if a manager is turned off and then back on
        • Joe tried to troubleshoot on the fly but needs to do more detailed investigation
  • We will try switching between spectral line and VPM-only coherent modes. Same strategy as above.
    • Scan #35 is Mode 1 (SB submitted at 13:26)
      • Banks B, C, and D aborted
      • Joe doing a restart
      • SB resubmitted at 13:34 (scan #37)
    • Scan #36 is c0800x0512 (SB submitted at 13:35)
      • ROACH2's still in service on banks E -- H. No daq pulse on these banks either. These banks were not used in the previous spectral line mode scan. The shared memory seemed to be configured for coherent pulsar mode. Spectral line data was written on these banks , though.
      • Banks A--D have good data. These were the banks that were configured for spectral line mode in the previous scan.
      • Resubmitting at 13:42. Scan is #39
      • This time it seems to have configured properly for pulsar mode.
      • All looks good in this second scan.
  • Joe needs to investigate issues with the adapter in more detail offline.
  • We will see if we can recreate the issue with the 10-gigE MAC addresses in coherent modes.
  • Running CoherentModeTests with both VPM and GUPPI, no coherent mode data monitoring tools enabled.
    • SB submitted at 13:53
    • Scan is #40
    • Oops! Edited the wrong scheduling block. This was a VPM only scan.
    • Resubmitted at 13:55. Scan is #41
    • Seems to be running just fine
  • Will now enable the daq_server agents on the VEGAS HPCs and run again with VPM and GUPPI
    • SB submitted at 13:58
    • Scan is #42
    • No issues
  • Now enabling coherent mode autoplotter
    • SB submitted at 14:02
    • Scan is #43
    • Fault triggered on Bank C: "Aborting: VEGAS HPC program taking too long to be ready: 2802 mS"
    • Killed VPM autoplotter. Trying again.
    • Resubmitted SB at 14:12
    • Failed again, same as above
  • Tried several scans up through #47. Sometimes Bank C would fail, sometimes it would not. No obvious trigger. Joe says that it isn't always finding NETSTAT, even though NETSTAT is in shared memory. *I've noticed that the c0800x0512 scans look kind of crappy. The S/N is low compared to GUPPI even though the levels seem OK. I am trying a C-Band scan to get away from RFI
    • SB submitted at 15:01
    • Scan is #48
    • Cal is nice and bright in GUPPI but completely invisible in VEGAS!!!
    • Trying again with fftshift set to all a's.
    • SB submitted at 15:06
    • Aborts! Aborts everywhere.
    • Trying again. Had to do a RestartTurtle. SB submitted at 15:13
    • The invisible cal may be an IF routing issue. It seems like there is a problem routing C-Band to both VEGAS and GUPPI
    • Stop/start managers
    • Resubmitting for VPM only, C-Band, c0800x0512, fftshift = 0xaaaaaaaa
      • Getting immediate abort again. Managers came up in a weird state. Status is fatal on most banks and manager control via the coordinator, though CLEO, is not working (i.e. the Managers do not respond when issued an "off" command)
  • Time is up. Starting to switch back to 16.3
  • Checked Mode 1 and 20 at 16:07. Look good.

Conclusions

  • There are spectral line files with egregiously wrong time stamps (1858_11_17_00:00:00E_8435.fits) that don't seem to have been generated during normal scans. They don't show up in gbtidl. Joe seems to understand why this is happening. Has something to do with the adapter putting the HPC into a "monitor" state, which causes an invalid scan to run.
  • When switching between incoherent and coherent pulsar modes, if a bank is not used in the incoherent mode, then it's guppi_daq dies and does not come back up properly in the coherent mode. The trigger seems to be having a bank in a pulsar mode, turning it off, and then running a coherent mode scan. The bank(s) in question will turn back on but their guppi_daq does not come back with them.
  • Bank C failed to find NETSTAT on some scans, but the problem was intermittent. No obvious trigger was found.
  • Coherent mode scans ran successfully when the auto plotting and daq_server scripts were running. The Bank C issues happened both with and without the auto plot script.
  • Cal S/N looked really low in coherent pulsar scans compared to GUPPI. Need to investigate further. Attempts to move to C-Band, where RFI would be less of an issue, were stymied by possible IF routing issues and other problems with VEGAS (see above). * Did not get to test raw modes or scales for incoherent modes. * The good news is that switching back and forth between spectral line modes and incoherent pulsar modes seems to work well.
-- RyanLynch - 2017-10-24
Topic revision: r3 - 2017-10-26, RyanLynch
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback