Joe's Outline:

CCU Hardware

EtherCAT card
Condition/Test: The EtherCAT card indicates error or link-loss

Detected by: CCU Fault-analysis

Expected Response: PLC commanded to disable axis (brake sequenced stop)

See section 8.1.1.2 PLC-PEI Link Failures.

IRIG card
Condition/Test: The IRIG card halts generating interrupts, due to some internal error.

Detected by: CCU task watchdog, PLC watchdog

Expected Response: PLC watchdog updates halted, PLC disables axes with brake sequenced stop

This test requires the use of a test program, which will purposely change the IRIG card configuration to simulate a failure.

  1. Enable the azimuth axis through the OCU or M&C.
  2. Using the test program disable the IRIG card interrupts by running the command:
    • test_irig -den (NEW can this be run while CCU is executing???)
  3. Verify the PLC disables the azimuth axis. _______(Check)
  4. Repeat lines 1-3 for elevation. _______(Check)

Processor Fault
Condition/Test: Except for an overtemp condition, a processor fault generally results in a system halt or system freeze.

Detected by: PLC watchdog

Expected Response: PLC disables axes with brake sequenced stop

  1. Enable the azimuth axis through the OCU or M&C.
  2. Simulate a processor failure by disconnecting AC power to the CCU host computer.
  3. Verify the PLC disables the azimuth axis. _______(Check)
  4. Repeat lines 1-3 for elevation. _______(Check)

Processor Cooling Failure/Overtemp
Condition/Test: A loss of processor cooling will activate the processor's thermal control circuitry. This will attempt to cool-down the processor by either slowing the clock speed, or reducing the clock duty cycle. Either case will result in reduced system performance.

Detected by: CCU thermal monitor

Expected Response: CCU disables axes with servoed stop. Once disabled, axes cannot be re-enabled until the high temperature condition clears.

  1. Edit the Config.xml file change the line:
    • <Device name="hothead" type="CPU_ThermalMonitor" poll_id="1000"> to
    • <Device name="hothead" type="CPU_ThermalMonitor" disable_for_testing="1" poll_id="1000">
  2. Disable axes and restart the CCU host software.
  3. Enable the azimuth axis through the OCU or M&C.
  4. Start the failTweaker program.
  5. Run the command in failTweaker:
    • set_node CPU_ThermalHeadroom 14
  6. Verify a thermal warning is presented on the OCU and Cleo screen, but the axis remains enabled. _______(Check)
  7. Run the command in failTweaker:
    • set_node CPU_ThermalHeadroom 5
  8. Verify the axis is disabled and a thermal fault message appears. _______(Check)
  9. Verify the axis cannot be re-enabled from the OCU/ACU. _______(Check)
  10. Clear the simulated thermal fault by running the command in failTweaker:
    • set_node CPU_ThermalHeadroom 25
  11. Verify thermal fault and warning messages clear. _______(Check)
  12. Repeat lines 1-11 for elevation. _______(Check)
  13. Return the Config.xml file to its original condition and restart the CCU host software.

Condition/Test: System communication error with MCI's (EtherCAT down)

Detected by: CCU, MCI

Expected Response: CCU commands PLC to disable axes with brake sequenced stop, MCI's assert inhibit line

Condition/Test: Modbus link down

Detected by: PLC watchdog

Expected Response: PLC disables axes with brake sequenced stop

MCI rate/auto bit
Condition/Test: MCI rate-loop/MCI mode bit disconnected

Detected by: MCI

Expected Response: MCI asserts inhibit signal

PEI Encoder Failure
Condition/Test: Encoder power loss or failure

Detected by: PEI?

Expected Response: PEI indicates fault in data stream

Communication Errors

Condition/Test: Already covered in { 5.3.4.1-4 } System communication loss with ACU, while ACU is actively in control

Detected by: CCU

Expected Response: CCU commands servoed stop.

Condition/Test: {Sections 8.5 & 8.6 and 5.3.4.1-4 } System communication loss with OCU, while OCU is actively in control

Detected by: CCU

Expected Response: CCU commands servoed stop.

IRIG Signal Loss
Condition/Test: IRIG signal loss while active

Detected by: CCU

Expected Response: Message indicating signal loss, otherwise normal but unsynchronized operation.

  1. Enable the elevation axis through the OCU or M&C.
  2. Run the following command on the CCU host computer:
    • ntpq -p
           remote           refid      st t when poll reach   delay   offset  jitter
      ==============================================================================
        ...
      *SHM(0)          .SHM.            0 l   42   64  377    0.000   -0.040   0.006
      
  1. Verify there is an asterisk next to SHM, and 'reach' is 377 (similar to the example above). _______(Check)
  2. Disconnect the IRIG signal from the CCU host computer
  3. Verify the message IRIG signal loss appears. Note: The message may take a few minutes to appear, this is normal. _______(Check)
  4. Verify the axes do not disable. _______(Check)
  5. Re-run the command from step 2 a few minutes apart.
  6. Verify that the reach field changes to a value other than 377. _______(Check)
  7. Reconnect the IRIG signal to the CCU host computer.

Kernel Rate-loop bit wrong
Condition/Test: The Kernel sets the incorrect rate-loop/MCI setting in EtherCAT status.

Expected Reaction: MCI detects difference between hardware signal, and EtherCAT output status bit. MCI asserts error bit in ECAT input status.

ACU command Error
Condition/Test: The ACU issues a command stream which violates system limits, or fails to issue commands.

Expected Response: Command is ignored. In the case of a time-stamp being 'older' than 2 seconds, then axes are disabled and a message is displayed.

OCU command Error
Condition/Test: The OCU issues a command which violates system limits.

Expected Response: Command is ignored, message on OCU display.

Task Sequencer Halts
Condition/Test: System sequencer halts (due to scheduler starvation or IRIG error).

Detected by: CCU task watchdog

Expected Response: PLC watchdog updates halted, PLC disables axes with brake sequenced stop

  1. Using the ACU or OCU, enable the azimuth axis.
  2. Using the failTweaker program, simulate a sequencer lock-up by running the command in failTweaker: halt_sequencer 1000 NEW
  3. Verify the axis Az axis disables. _______(Check)
  4. Restart the CCU software and repeat steps 2-3 for the elevation axis.
  5. Verify the El axis disables. _______(Check)
  6. Restart the CCU software.
  7. Enable the azimuth axis.
  8. Using the failTweaker program, simulate a severe sequencer error by halting the base rate sequencer with the command:
    • halt_sequencer 1 NEW
  9. Verify the azimuth axis disables. _______(Check)
  10. Repeat steps 6-9 with the elevation axis.
  11. Verify the elevation axis disables. _______(Check)

Task stuck
Condition/Test: A task locks up due to resource conflict or deadlock condition

Detected by: CCU task watchdog, if enabled for the respective task.

Expected Response: PLC watchdog updates halted, PLC disables axes with brake sequenced stop

  1. Verify the Config.xml file includes the TestGenerator module, and includes the watchdog attribute as shown below:
    • ==<Core name="fred" type="TestGenerator" watchdog="5" poll_id="1">
  2. Using the OCU or ACU, enable the azimuth axis.
  3. Using the failTweaker program cause a task overrun which is shorter than the 5 (millisecond) trigger level:
    • wd_poll_delay 2000000
  4. Verify the axis does not disable. _______(Check)
  5. Using the failTweaker program cause a task overrun which is longer than the 5 (millisecond) trigger level:
    • wd_poll_delay 6000000
  6. Verify the azimuth axis disables. _______(Check)
  7. Reset the watchdog using the failTweaker command: reset_watchdog NEW
  8. Repeat steps 2-7 for the elevation axis.
  9. Verify the elevation axis disables. _______(Check)

System Error/Freeze
Condition/Test: Total system hang.

Detected by: PLC watchdog

Expected Response: PLC watchdog expires and axes are disabled with brake sequenced stop

  1. Using the OCU or ACU, enable the azimuth axis.
  2. Simulate a system hang or crash by killing the CCU process.
  3. Verify the azimuth axis disables. _______(Check)
  4. Restart the CCU software and repeat steps 1-3.
  5. Verify the elevation axis disables. _______(Check)

Task overrun
Condition/Test: A task does not complete within allotted time, or the entire cycle time exceeds the period of the process cycle for N consecutive cycles. Where N is the watchdog 'tolerance' for the respective module.

Detected by: CCU task watchdog

Expected Response: PLC watchdog updates halted, PLC disables axes with brake sequenced stop

  1. Verify the Config.xml file includes the TestGenerator module, and includes the watchdog attribute as shown below:
    • ==<Core name="fred" type="TestGenerator" watchdog="5" poll_id="1">
  2. Using the OCU or ACU, enable the azimuth axis.
  3. Using the failTweaker program cause a task overrun which is shorter than the 5 (millisecond) trigger level:
    • wd_poll_delay 2000000
  4. Verify the axis does not disable. _______(Check)
  5. Using the failTweaker program cause a task overrun which is longer than the 5 (millisecond) trigger level:
    • wd_poll_delay 6000000
  6. Verify the azimuth axis disables. _______(Check)
  7. Reset the watchdog using the failTweaker command: reset_watchdog NEW
  8. Repeat steps 2-7 for the elevation axis.
  9. Verify the elevation axis disables. _______(Check)

Task deadlock
Condition/Test: A task deadlocks waiting for a resource.

Detected by: CCU task watchdog

Expected Response: PLC watchdog updates halted, PLC disables axes with brake sequenced stop

(Same as 'Task stuck' above, delete this section.)

Core dump/Fatal Error
Condition/Test: Fatal program error resulting in program exit/core dump.

Detected by: PLC, MCI

Expected Response: PLC watchdog times out, and axes disable (brake sequenced stop). MCI's ramp currents to zero.

(Same test as 'System Error/Freeze' above, delete this section.)

Mis-configuration Tests

Az/El ECAT cables swapped

Condition/Test: Hardware cables swapped between AZ & EL.

Detected by:

Expected Response: Links should fail to initialize

Duplicate CCU process

Condition/Test: A second CCU process is started, while another is already running.

Detected by: CCU software on initialization

Expected Response: No affect on already running CCU process, second process exits

  1. While the CCU software is running, attempt to start a second CCU process.
  2. Verify the second process exits, and that the first process remains running.

Incorrect PLC host

Condition/Test: The IP address of the PLC is incorrectly configured.

Detected by: Fault-Analysis sees inconsistent status?

Expected Response: Axes disable?
Topic revision: r5 - 2011-12-12, JoeBrandt
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback