TGBT15A_915_133 - Tests over Thanksgiving Shutdown

Goals

Take advantage of the time available during the Thanksgiving shutdown to:
  • During the Thanksgiving shutdown tests the Managers were still crashing when switching back to LBW spectral line modes (although they recovered on their own). The goal here is to run through the permutation tests to see if that is still happening.
  • If time, at Ray's request, test the sequence coherent mode --> spectral line mode --> coherent mode without a change in BW.
  • Also if time, test FSW in Modes 24 and 29.

Details

  • Joe started version switch at 10:30
  • Ryan took control at about 10:40
  • Session begins at around 10:55
  • Will run the TestModeSwitching SB, which will run through permutations of HBW, LBW, INCO, and CODD modes
    • SB submitted at 10:58
    • First scan is #1
    • Still getting "Test message (please ignore)" from VEGAS in CLEO. Not a big deal, but we'll want to fix that in the release.
    • First scan in Mode 20 seemed to begin and end without issue.
    • Diode scan in i0800x0512 has a bunch of artifacts in it again. It must be RFI related. I wonder if L-Band is in focus.
      • Amanda confirms L-Band is not in focus
    • c0800x0512 mode seemed to configure and Astrid ran a scan, but VEGAS never went into a Running state and did not take any data. There were no CLEO messages or Astrid errors.
    • Got an "LBW Balance failed. Check Digital IF Snaps" during Mode 20 scan (scan #6)
    • Pausing at 11:12 to let Dave rotate L-Band in to focus.
    • Unpausing at 11:18
    • Scan #7 is c0800x0512. Seems to be running OK this time, but VEGAS Bank A did not balance well (levels around -25 and -27 dB)
      • Still lots of artifacts in the data, even with L-Band in focus
      • Confirm that it is the right BOF file and that fftshift is set to all f's
    • Balanced better on next scan (i0800x0512)
      • Still lots of artifacts.
    • Not sure where scan #9 went....
    • Scan #10 is i0800x0512, and the data look great! Argh! Why can't it always look like this?
      • Notch filter was still out from previous c0800x0512 scan. Didn't seem to hurt anything.
      • Levels were a little higher than previous scans, but not by much.
    • Got an "IF Balance Failed" on scan #11 (mode 20)
      • Data still look OK
      • Tsys looks much better now that L-Band is in focus
    • Last few scans seem to be running OK.
    • Time is 11:50. Ryan is going to grab lunch and let this run.
      • Back at 11:53
      • Got a message at 11:50 from VEGAS and ScanCoordinator, both similar:
        RPC connection from [ScanCoordinator,Vegas] on [gbtdata.gbt.nrao.edu,vegas-hpc1.gb.nrao.edu] to [VEGAS,BankBMgr] on [vegas-hpc1.gb.nrao.edu,vegas-hpc2.gb.nrao.edu] with progum 0x290004b5
    • IF Balance failed for VEGAS during scan #24 (mode 20)
    • Don't see data for scan #21
    • Scan #22 (c0800x0512) seems to have a few artifacts again, not severe. Previous few scans looked good.
      • Same for scan #23 (i0800x0512)
    • Things running smoothly through 12:34. Currently on 10th permutation
    • Levels were really high on scan #31 (c0800x0512). Did balance fail? Nothing in logs...
    • IF Balance failed on scan #45
    • Just noticed that file names were not correct on all VEGAS banks for scans #22, 28, 31, 39. Problems seem be on Banks E-H. These are all c0800x0512 scans.
    • VEGAS samplers were about 3-4 dB too high on scan #47
    • Got the RCP errors again at 12:50
    • Modes 41 and 45 are Mode 20. There are nasty spurs all over the spectrum.
    • Bandpass also looks terrible on scan #42 (c0800x0512). Scan #46 looks better.
    • Scan #44 has a bad noise diode signal. It looks like the diode was sometimes firing too soon, so there is an image offset in phase (both time and frequency, so it's not just one subintegration). Artifacts in the phase/freq plot, as well.
    • Scan #47 also has a really bad bandpass, but still has some phase/freq artifacts
    • Scan #49 looks a bit better, but the bandpass is definitely not great. I'm not seeing an obvious issue with the VEGAS balancing
    • Seeing same issues with the bandpass in scan #50 (Mode 1).
    • Scan #51 (Mode 20) has spurs all over the spectrum again
    • Scan #56 (Mode 20) scan looks much better
    • Looks like RCP errors again around 13:03.
    • Spectral line mode scans 56, 58, 59, 62, and 64 seem much better.
    • HPC monitor crashed with
      Traceback (most recent call last): File "vpmHPCStatus.py", line 117, in <module>
          curses.wrapper(do_it)
        File "/opt/local/lib/python2.7/curses/wrapper.py", line 43, in wrapper
          return func(stdscr, *args, **kwds)
        File "vpmHPCStatus.py", line 97, in do_it
          stdscr.addstr(curline, col+12, l.rstrip(), color)
      _curses.error: addstr() returned ERR<traceback>
    • Last scan is #76
  • Ended TestModeSwitching SB at 13:54, after 19 permutations
  • Testing an i0800x0512 --> c0800x0512 --> i0800x0512 sequence, per Ray's request
  • Going to run through Modes 20 -- 29 with FSW
    • Submitting SpectralLineTestsFS SB at 14:04
    • Scan numbers start with #80
    • Got an abort during Mode 23 scan (#83): "Bank F died during a scan. Aborting scan."
      • Ray and Joe looking at it. Suggested forging ahead.
    • Resubmitting SB at 14:17. Scans will start with #84
      • Bank F is turned off for Mode 20, but scans still seem to be running.
    • Bank B and D crashed during Mode 24 scan. Same CLEO message as above but for Bank B. Note that the messages come from vegas-hpc1-10 even though it references Bank B (which would be vegas-hpc2-10)
  • Noticed that at some point the ScanCoordinator window stopped displaying all the info normally on the upper right corner of the screen.
    • Checked with Wilson. Doesn't seem to be an issue for her.
    • Looks fine to me when I open it on my own machine and reopen it standalone on titania. I guess my instance just got wonky...
  • Going to wrap up with some scans in c0800x2048, c0800x4096, c1500x2048, and c1500x4096 using CoherentModeTests SB
    • SB submitted at 14:33
    • All looks good in c0800x2048 and c0800x4096. Still getting lots of dropped packets in c1500x2048 and c1500x4096
  • Going to start switching back to release at 14:43
  • Submitting CheckMode1 SB at 15:16
    • Ran OK
  • Submitting CheckMode20 SB at 15:17
    • HPC program taking too long to be ready. Joe is looking into it.
    • Joe fixed, all seems well now.
  • Session ends at 15:30.

Conclusions

  • Significant issues
    • There still seemed to be some issues balancing VEGAS and the IF rack (though not sure if that is related to any M&C issues)
    • Still getting crashes in some of the FSW LBW modes.
    • Some banks in the coherent mode scans seemed to lose connection to GBT status database and have "Unknown" in the file name strings.
    • Lots of bad bandpasses today. I know there is filming going on at the telescope, so it is possible that this was locally generated RFI. That it was affecting HBW, LBW, INCO, and CODD modes on multipe IF paths/ROACH's in a very intermittent way seems to support this, though it is very unsatisfying to just blame RFI.
  • Minor issues
    • Still getting "Test message (please ignore)" from VEGAS
  • Issues for Ryan
    • Autoplotter is not reliably updating (but seemed OK after a restart and addition of debug statements)
    • Look into HPC monitor crash
-- RyanLynch - 2017-12-01
Topic revision: r2 - 2017-12-04, RyanLynch
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback