What is Yet to be Done on SCU/SRP

1.0 Documentation

1.1 Polish the Test Procedure

1.2 System Configuration Setup Docs

  • CCU configuration
  • SCU configuration
  • How to setup xenomai
  • How to setup installation directories DONE

1.3 Status-def and ACU

  • Adding messages
  • Interface Versioning

1.4 Config Files

  • General layout DONE
  • Updating StatusDef.yaml on SCU

1.5 Component Docs

  • VME Analog I/O
  • VME Bus Interface
  • VME Digital I/O
  • Arbitrator
  • ACU Interface
  • Subreflector Encoders
  • Subreflector Control Algorithm (AKA PID)
  • Prime Focus Encoders
  • Prime Focus Control Algorithm
  • Turret Encoder
  • Turret Control Algorithm
  • Modbus Interface
  • Auto Stow Controller
  • Fault Analysis

Utility/Support Class Documentation

  • RT log_t Backend
  • Subreflector Delta Limiter
  • RT-Threads

2.0 Auto Stow DONE

During a power failure, the VME chassis looses power. To mitigate this a monitoring process will periodically check for power loss, and kill the scu_servo process when power is lost. Once VME power is reestablished, the monitor will re-run the scu_servo process.

2.1 Auto Stow FSM DONE

New, needs work.

2.2 Auto Stow Controller DONE

It knows how to take control, but the stow procedure/FSM (2.1) needs to be replaced (i.e. it doesn't exist).

3.0 Bugs

3.1 Fix Config File Paths DONE

Should always be able to find the config files. The key item here is that while the top level config file can be specified with the -c option, it will in turn look for the SCUlogic.yaml. Now fixed to find the file relative to the top level config file.

3.2 Figure out Component State Issue DONE

I noticed that the matrix Component states appear to not be changing. Other matrix based systems do not have this problem. Bug somewhere...

Somewhat complicated issue here. In RT threads, the normal clock (CLOCK_REALTIME) [sidenote: although it has 'REALTIME' in the title, it means 'real wall clock time'] is not adjusted by ntp, as it is in non-rt threads. To overcome this limitation, the Xenomai folks created a new clock (CLOCK_HOST_REALTIME) which is adjusted by ntp and is useable in a RT context.

The problem is that you cannot use a CLOCK_HOST_REALTIME time to wait on semaphores and such, since the rt-kernel will always use CLOCK_REALTIME to compare against.

3.2 Figure out encoder jumps on startup DONE

I fixed this by adding a charge-up delay for the error reporting on the jump checks, and making the threads RT.

3.3 Fix Clock Source in log_t's DONE

I am still seeing log_t messages which are using CLOCK_REALTIME, which is only good for relative times in xenomai. The log_t needs to use the CLOCK_HOST_REALTIME clock in xenomai. Now fixed.

3.4 Fix the Odd Deadlock noted in tsemfifo DONE

I have had several instances of data stream lockup's in the tsemfifo code. Doesn't make a lot of sense, but there are a couple ways this could happen. Now fixed in matrix.

3.5 PF Pol Position Readback Incorrect DONE

There was an issue noted during testing that the PFP readout was incorrect. Looks like the turret positions are fine, but PFP is foobar. Another very odd thing is the large shift using the old system just after boot and just after taking control with the ACU. The loaded encoder offset is relatively small, but the encoder direction flag must get flipped.

After the fix, PFP reads -93.91 on both new and old systems. Turret values were always correct.

# What fixed it? There were two items:
  • One was the use of a short write to the control/status register at offset 0x2+base address. It must be a char access, otherwise an access fault/bus error results.
  • After a vme reset, the board seems 'cold', i.e. it doesn't respond/initialize the first time. I added a loop which tries to init, if it fails it waits 333ms, then tries again up to 5 times. The csr register is readback and compared with 0x4d, which seems to be the correct setting.
  • Its possible the SBC VME access timing is slow enough where delays are not necessary. On the stout gbtscu, timing is likely to be faster.

3.6 Ethernet Based Watchdog Timing out DONE

For some reason the 'toggle' watchdog which writes a value to the PLC over Ethernet was timing out repeatedly with the new SCU. This prevented us from enabling an axis in last week's tests.

The count of the toggle WD timeouts in contained in a register accessible with mbtool. ```mbtool -h gbt_scu_plc -p 502 -get S29```

I looked into the timing of the updates using the mbtool in a loop: ``` while true; do sleep .1 mbtool -h gbt_scu_plc -p 502 -get C320 done ```

With the legacy system the update was around 2Hz. I found with the new system there was a sampling error, as the Arbitrator loop toggled a value at an asynchronous rate to the asynchronous updater in the ModbusInterface. The difference in rates would cause updates to be at the right rate but of the wrong value.

I fixed this by having the Arbitrator send the ModbusInterface a counter instead of a bit. The ModbusInterface checks to see if the counter is changing, and if so, toggles the bit.

3.7 Velocity Fields of Encoder Stream All Zero DONE

We discovered that when running on the real telescope, the velocity fields of the encoder stream were not being filled in.

# What fixed it? * Added a simple sequential difference calculation on subsequent encoder positions. * This should be amended to be a differencing filter/low-pass filter.

4.0 Status Interlock Tracing DONE

Be able to answer the question: "Why isn't an axis enabling?" The answer on the current system is to call Joe, who uses cleo message window with filters to view just the antenna info messages. I'd like the system to be able to 'trace' and show what set of status is preventing operation. Therefore there is scope-creep here.

Perhaps I'm making this more complicated than it needs to be. Perhaps adding " INTERLOCK ACTIVE" messages are enough? To minimize messages, perhaps mask the irrelevent messages in the antenna manager by mode? (e.g. in PF with boom out, there are interlocks active preventing Y1..Z1 from moving. We don't want those messages displayed, because we expect them to be present.)

I've put logic into the antenna manager to assert the interlock message when (a) the 'ok_to_enable_xx' status is false and (b) the user selects the axis to be enabled.

5.0 Abstract Loop Timing DONE

This needs to happen, as we may change the loop rate.
  • Need to abstract timer loops in case we want a different timing rate
  • (e.g not oneSecCounts=50, should be oneSecCounts=TICKS_PER_SEC)
  • See LoopTiming.h

6.0 Increase Error/Status Visibility in SCU DONE

We need to have logs which can be analysed with time stamped measurements (log_t), and some way to inquire internal state of the SCU (keymaster keys and ACU messages).

We're using log_t now, so that's taken care of, but what want to is to simply see the values stored in the keymaster. We should be able to do this by leveraging vegas_matrix_status. DONE

We have added keyhole-scu/python/status_monitor

6.1 Convert rt_printf's and Messages to log_t's DONE

  • Keyhole-scu has log_t's which use an rt_printf backend. We should migrate to using that. DONE
  • Can we make a log_t which maintains state? DONE - new type Message_t's

6.2 What should we make available via keymaster?

  • We will need to make subreflector encoder offsets adjustable via keymaster (DONE)
  • Perhaps a configuration item to enable/disable the offset adjustments?

6.3 Slogging - SCU Samplers DONE

There are several currently for encoders, position loops, ACU commands. Are these enough?
  • added closure time stream, which shows the measurements used to determine if the watchdog should be reset. DONE
  • added flashing the 'fail' led on the D/A board if the watchdog is happy. DONE

6.4 View SCU Samplers with Matrix Ocsilliscope DONE

We need to resurect Matrix Ocsilliscope (doesn't build, in Qt 3, new version in Qt5/Qwt 6.1.6) so we can do debugging in real time. (DONE)

6.5 Provide convenient way of viewing data sinks in real time DONE

This functionality is identical to the Matrix Ocsilliscope, but we implemented in matrix/python/python3 using zmq and pyqtgraph.

7.0 Antenna Manager Modifications

7.1 New RPC version for SRP DONE

  • DONE Already done. Version 2 is for the new SCU, version 1 is for the legacy version. Selected in antenna.conf.
  • DONE I don't think this is master or release_19.4 (just a dev branch in my area) Need to find a way to integrate into a released version. (Now on the antenna_dev, release_19.4 derived, branch.)
  • DONE In a new version I also have code to read the yaml format of statusDef.yaml, which also exists on the SCU.
  • Need to make sure both are same version, and that messages make sense. - probably need new messages Paul added
  • Rebase the changes onto release 19.4, with VxWorks backward compatibility.

7.2 Special Version for Testing DONE

  • DONE Need to disable safemode when one axis e.g. Y1 drops out
  • DONE Need to disable SW limits check
  • Need to disable ACU delta limit checks and reactions to the delta limit status

8.0 Better Turret Rotation DONE

The old system requires the ACU to perform low-level sequencing of SCU hardware. Make the SCU do this. There is a start on this, with the turret state machine in the ACU interface component. That should be moved into the Arbitrator/Other component.

9.0 Better Boom Operation DONE

The old system requires the ACU to perform low-level sequencing of SCU hardware. Make the SCU do this. There is a start on this, with the boom state machine in the ACU interface component. That should be moved into the Arbitrator/Other component, to support use by AutoStow (and eventually an OCU interface).

10.0 Position Loop Taps DONE

  • Test taps with matrix-scope
  • 9/3/2021 added fields in subreflector tap to allow debugging of delta limiter.
  • Also see 6.3

11.0 Delta Limits DONE

  • This has proven to be much more difficult than planned, and the full algorithm will not be included in the release.
  • A 'last chance' algorithm will be released, which while not optimal should prevent hardware delta limits
  • The real way to address this is to use a MIMO (multiple input, multiple output) loop control algorithm.

12.0 Subsystem Tests

What needs to be tested debugged prior to system/acceptance testing

12.1a Testing Infrastucture DONE

  • DONE find/configure a laptop for use at the RX and PF teepee areas
  • DONE find/verify a 100/1000 base T network connection

12.1 Check Subr and PFF/PFX Encoder Operation DONE

  • DONE Tested on 11 May 2021 using dio_encoder_test.yaml
  • Positions matched exactly.
  • I seem to be able to run both old and new versions simultaneously to confirm exact readings. (Cool!)

12.2 Check PFP and Turret Encoder Operation DONE

  • Make sure offsets are applied/correctly.
  • (I seem to be able to run both old and new versions simultaneously to confirm exact readings.)

12.3 Check HW Watchdog Signal DONE

  • DONE tested to be present and at the correct frequency 5/19/2021
  • Work with Jason/John to integrate/test PLC changes (week of 6/7/2021)

12.4 Check/Test PLC Mods DONE

Some ideas to test the PLC modifications. Note we will need to update the statusDef file to see the message in antenna manager.

12.4.1 HW Watchdog DONE

  • Enable HW Watchdog by running old system, and enabling axes
  • write 1 to C322 (HW watchdog enable).
  • Since the old system doesn't update the WD
  • This should cause axes to disable and we should see the triggered indication.
  • Need to add monitoring of HW-WD fired status bit ALERT!

13 Code Cleanup

13.1 Remove unused elements

  • Unused sinks/sources in Arbitrater component
-- JoeBrandt - 2021-05-11
Topic revision: r22 - 2021-09-16, JoeBrandt
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback