2006/11/01 (Wednesday): On Site = JGM (Arrived late evening.)
Email from Darrel informed me about the following problem: "There's a problem with the DSP in the holography receiver, which I'm hoping that Robert Ridgeway can fix in the morning (=Thursday). I've arranged to meet Robert at the ATF at 6:30 on Thursday morning. The plan is to make some measurements on the relative frequencies of holography rx & tx through dawn, as the ambient temperature changes rapidly. Ralph Marson is going to come out around 9:00, and then we've got the holography telecon at 9:30 a.m. local time."
Comment from Darrel regarding software: "Debra has been trying to instruct me on the software to run things, but it's turning out to be easier to just run scripts than to use the full Schedule Block system."
2006/11/02 (Thursday): On Site = JGM, DTE, RM
Successfully ran a few dummy maps to test the monitor and control plus archive system while Ralph was here. In the evening, JGM and DTE tried to edit a scheduling block and write it back to the archive, but the write process produced an error which halted tests for this day. RM was also not able to extract holography ASDMs from the archive. Decided to allow Scott Rankin to proceed with a software update.
2006/11/03 (Friday): On Site = JGM, DTE, RM, DS
Restart of entire software system seems to have cleared-up problems encountered last night.
Extensive discussion with DS and RM regarding current state of holography integration and lessons learned.
RM creates a simple python script which will allow DTE and JGM to dump raw correlation data to a file for offline analysis. This is a cornerstone of any hardware/software integration.
After RM and DS leave DTE and JGM continue testing, but not for long. Control system crashes and following reboot does not produce sensible DSP data. This problem stopped all work for today.
2006/11/04 (Saturday): On Site = JGM, DTE, JK
Jeff Kern arrives and diagnoses DSP problem. Ultimately finds that DSP has gotten into a troubled state where the most significant byte of the output DSP data is bad. This is apparently not a problem for the LabView monitoring software as the suspicion is that the LabView software just ignores this (sign) bit and reconstructs the sign from the data by other means. Only a power cycle of the 48V supply (via the smart plugstrip) solves this problem.
JGM and DTE have become suspicions about what our crude data dumping script is doing to the receiver based on goofy results obtained last night (before DSP failure). Jeff Kern digs into this and finds that the "startPhaseCal" command does a receiver initialization and tuning before starting data acquisition. Since the current tuning algorithm is wrong, this was retuning (and detuning) the receiver and leading to the goofy results. Jeff Kern commented-out the retuning part of the startPhaseCal routine so that we could still use it in our python macros.
Jeff Kern informed us about a "goodness of data" flag in the TowerHolography container. This is an ored collection of the various monitoring flags for the holography receiver (see ICD for details). We now write this to raw data files as an indicator of data goodness.
While trying to run our data dumping script encountered several DSP and control container crash problems.
2006/11/05 (Sunday): On Site = JGM, DTE, RM
Ralph Marson arrives and fixes several problems:
M&C tuning algorithm: Removed synthesizer frequency sweep and inserted proper Gunn voltage setting.
Holography Map Script: Rewrote holography map script in several ways. Most importantly, made it self-contained to remove all reference to SB inputs (so that it can eventually be used in manual mode).
After a few failed attempts (due to an array of M&C failures), made first holography map. Could not get clic to convert ASDM file to clic format. Contacted RobertLucas.
Attempted to make more maps, with very little success. Problems encountered were:
Map died due to CONTROL/ACC/cppContainer failure.
Map dies due to "other exception". Seemed to be a failure in getSubscanData.
Map dies on row 84 for no apparent reason. When I tried to restart found that Control AMB socket server would not stop.
Map dies on row 175 due to "other exception". As before, seems to be a failure in getSubscanData.
Map corrupted due to "symmetric DSP failure" very early in map. Since I cannot stop, I had to stop map by putting antenna into shutdown.
Map dies on row 56 due to "other exception". As before, seems to be a failure in getSubscanData.
Map dies on row 74 due to "other exception". As before, seems to be a failure in getSubscanData.
Map dies on row 81 due to "other exception". As before, seems to be a failure in getSubscanData. I give up!
As all of these failures seem to require a full restart of the system, this extremely inefficient process requires many hours of what is currently wasted effort.
2006/11/06 (Monday): On Site = JGM, DTE, RM
Ralph Marson arrives and fixes at least part of the "other exception" problem. Found that one or more of the notification channel, the mount, or the MountController (previously called antMount) modules loses time samples which causes the data acquisition system to "hickup" and bring the M&C system down. This was likely causing the "other exception" crashes noted last night. RM puts in a fix which sets the position for a missing time sample to zero.
Justin Dressel and Scott Rankin install a new video card in golum which allows true three-screen functionality. They also implemented vnc for login sessions, allowing users to use vnc as their window manager. This will allow others to spy-in and help when problems arise. Unfortunately, encountered serious performance issues at various times. At least for now recommend that vnc not be the default login session.
Made several maps, some successfully ran through to completion. See Holography Map Log above.
2006/11/07 (Tuesday): On Site = JGM, DTE, JK, BG
RL finds that maps made last night all had a problem. The lost time problem that RM fixed last night was not quite the right fix. All data after one of these lost time events was incorrectly tagged with zero position, which makes all of the data useless.
At this point, we have several "features" to note about how the data is being taken:
Missing positions are now marked with Az/El= (0.0, 0.0). A better long term solution for this will need to come later.
The commanded and measured directions are now sampled every 48ms.
The Tower direction (reference position) is now written to the ASDM, but has not been checked by RL yet.
VLA site lost power for about 10 minutes at 7:00 PM or so, which caused everything to crash. Was able to get everything going again except the holography transmitter. We have only a page or two of what appears to be the documentation for this device, which gives no indication as to what to do about this condition. Will investigate this in the morning.
2006/11/08 (Wednesday): On Site = JGM
RL tells us that he had the wrong version of clic checked into CVS yesterday, so the version installed at the ATF today was foobar. New version was installed today and we can finally analyzed holography data locally (partly...see below).
Called Antonio Perfetto and Mike McCarty to get some help on the holography transmitter problem. Eventually was told about a known problem with the PCMCIA card on the laptop running the transmitter LabView GUI. See Trouble Shooting section below on this issue. After about 9 hours of futzing exacerbated by lack of any documentation on this system got transmitter going.
Power dropout last night caused HVAC in VertexRSI antenna to fail. Jack Meadows got it back going again, but this precipated a few other failures:
With no cooling in Rx cabin, and no ventilation in holography mini rack, CRG failed. See Trouble Shooting issue below for details.
Once CRG was destributing ticks again found that ABM was not receiving them. Ralph Marson diagnosed the problem and found that there was a bad interface cable, which had been an issue in the past. Replaced cable with a spare that he happened to have, but this spare did not have the proper polarity for the reset line, which caused the ACU to continuously reset. Took out reset swap cable (and taped to inside of cabin for later use) and everything was well again.
All-told VLA power drop failures caused about 12 hours of lost time. Whom at the VLA do I bill for this?
Still issues with position tagging of data.
2006/11/09 (Thursday): On Site = JGM, RM
Still trying to get right version of clic installed.
Still issues with position tagging of data, but they are not thought to be causing problems in the maps.
2006/11/10 (Friday): On Site = JGM, JK
Pretty smooth day. Continue to run series of checks on holography system.
With Jeff Kern's help iterated on axial focus to find peak SS power. Found optimum focus at -4mm.
Boresight position checked and found to be spot-on.
Continue to run series of checks on holography system.