Fixed the following bug: the buffers used by the slave node pipeline processes as the templates for the initial values of all data written to the BDF were not being properly initialized. This bug was introduced with wcbe-20110811.0, and is not in the "perftrials" version used immediately before wcbe-20110811.0 was deployed. The effect of this bug could vary with each configuration on each pipeline process; in the worst case, data that were "missing" (i.e, never received by the writing process before the data were written) would have unpredictable values in the BDF files. In many cases the missing data values recorded in the BDF would nevertheless be correct, but that outcome could not be guaranteed.
20110811.0
A new version of the CBE s/w is available, and a new version of the
MPICH2-1.4 library has been installed on the CBE. N.B: Because the CBE
code is dynamically linked to the MPICH2-1.4 library code, every CBE
version that was built against MPICH2-1.4 will use the currently
installed version of that library.
Implemented O(1) (modulo process contention), lock-free method for managing shared memory buffers (used for passing lag data between CBE processes).
Ensure that the slave node processes used by a configuration are spread across nodes maximally to allow the maximum achievable number of MPI-IO aggregation processes for each configuration.
Changed details of timing in slave node pipeline processes, including better clock resolution to support faster dump times.
Reduced lock contention and lock granularity among threads in slave node pipeline processes.
Added "backlog" field, which measures the number of integrations for which a CBE slave node process has unwritten data, to "wcbetool status" report.
MPICH2-1.4 changes: Merge blocks before calling system "write" function in MPI collective write routines. This change results in a significant performance improvement at high output data rates.
20110414.0
A new version of the CBE s/w is available. This version is identical to the one that has been in use over the last two weeks. With this version a change has been made to the CBE system configuration in that the "mpd" MPI process manager is being used rather than the newer "hydra" process manager; this change has been in effect for over two weeks.
Tag: wcbe-20110414.0 (installed 16 March as wcbe_connect)
Alignment of CBE integration to dumptrig epoch. (This is available, but I suspect that the capability has not been in use due to the CM version currently installed.)
wcbe_services adapts to the MPI process manager that is in use on the system.
Improvements in the time that it takes to configure the CBE (from receipt of configuration document from the CM to the configuration of all slave node processes.)
Implementation of proper signal masking in MPI-calling threads.
20110201.0
A new version of the CBE s/w is available. This version uses a new MPI library, which has been installed on the CBE nodes. It appears that older versions of binary executables in the stow directory do find the correct (old) library, but no testing has been done to verify that older versions are actually usable after a simple "stow" command.
Note: This was installed 26Jan11 (prior to being tagged).
Fix a regression in wcbe-20101214.0, which became apparent when running an "OSRO2" observing script.
Fix a failure to configure the CBE in some observing scripts due to a defect in XML processing (which has been in existence for some time). The failure was reproducible using the "C_2GHz" test observing script with 20 sec scans, but could unpredictably affect other scripts.
20101214.0
A new version of the CBE s/w is available. This version has not yet been stowed in the production environment.
Fix a race condition encountered during the setup of configurations, which results in a apparent loss of pipeline processes. This bug has affected scripts only very rarely.
Some build/install procedure fixes.
20101110.0
A new version of the CBE s/w is available. This version has not yet been stowed in the production environment.
20101014.0
A new version of the CBE s/w is available. This version has not yet been stowed in the production environment.
Fix a possible invalid memory reference to buffers used for accumulating data prior to writing to BDF file. This bug only rarely affected the shutdown of certain processes, and should not have had an impact on the creation of BDF files.
20101012.0
A new version of the CBE s/w is available. This version has not yet been stowed in the production environment, although the development version named "wcbe_events2", which should be identical to this new version, has been stowed since noon of 10/12.
Changes to prevent loss of the last scan in a configuration that uses integration in the CBE.
20100927
A new version of the CBE s/w is available; tlhis version has not yet been stowed in the production environment, although the development version named "wcbe_shutdown", which should be identical to this new version, is stowed.
Better handling of pipeline states during configuration changes when the CBE is doing integration, with the effect that the final BDF file of a configuration should not be "lost". Integration times longer than 4 sec should now be functional.
New options for "wcbetool status"; use "wcbetool status -h" for details.
20100920
A new version of the CBE s/w is available. This version has not yet been stowed in the production environment.
Time averaging in the CBE has been implemented. Support for CBE time averaging already exists in the CM, and averaging will occur automatically when the VCI document requires it.
Table needed to map baseline board CMIB hostnames to baseline board serial numbers is retrieved from the MCCC only upon every start of the "http" service.
Reduced quantity of log messages from the d10 server.
Allow running http service with an alternative port number (useful for software development testing).
20100823
New version of CBE s/w is available in the WIDAR production environment.
(The currently stowed version, as of 23 Aug at noon, is named "wcbe_queue", which is identical to "wcbe-20100823.0" in all but name.)
Summary of changes:
Fixed some bugs that tend to appear for short, or late-to-initialize configurations and sub-scans:
- Corrected bugs in buffering of data that arrives at the output element of a slave node processing pipeline before the file for a new sub-scan has been initialized.
- To avoid concurrent file initialization, further guard access to MPI functions from multiple threads during output file initialization.
20100819
New version of CBE s/w is available in WIDAR production environment.
Waiting for output file initialization is now done before, rather than after, a new output file is initialized. Waiting will thus introduce a delay only when a following sub-scan is requested while the preceding sub-scan is still in the initialization process.
Implement re-tries in starting the master node BDF writing processes due to apparent unreliability in starting processes via mpiexec.
Table needed to map baseline board CMIB hostnames to baseline board serial numbers is now updated automatically (instead of being maintained manually) by retrieving an XML document from the MCCC.
20100817
New version of CBE s/w is available in WIDAR production environment.
Fixed some issues related to data structures within the slave node pipeline processes shared by multiple threads (e.g, more mutexes, avoiding deadlock conditions, etc.).
Mapping algorithm that assigns WIDAR products to CBE addresses has been re-implemented as a C extension to Python.
Configuration files are now passed to slave node processes by reference (i.e, file path) rather than by value.
Fixed improper implementation of open/release in the FUSE-based pipelinefs filesystem. This bug in the master/slave communication channel was the cause of unreliable signaling of activation times and new subscans from master to slave node processes.
Fixed management of shared memory buffers to avoid invalid memory references after a shared memory buffer has been freed.
Queue visibilities that arrive at the pipeline output element prior to the opening of the output file for a sub-scan.
Properly synchronize and serialize the initialization of output files across slave node processes.
20100730
New version of CBE s/w is available in WIDAR production environment. This version should be "stowed" before 1700 today.
Activation times and start of subscan times are rounded to the nearest 10 ms after conversion to "WIDAR time" values (i.e, epoch/seconds/cycles).
Parallelized additional "wcbetool" commands (including blf-on/off), start_subscan and stop_subscan commands, and d10server's interaction with slave node processes.
Changes to ensure that "pending" start/stop subscan commands (that is, those that arrive before the corresponding configuration is ready) are processed in the correct order after the configuration is ready.
Improvements to shared memory management by slave node processes.
Improvements to process restart mechanisms for slave nodes.
Improvements in internal state machine models of configurations, activations, subscans, pipelines and processes in the cbe, and their implementation.
Environment changes:
Installation of /etc/romio-hints on all cbe nodes. This file is for tuning certain MPI-IO calls, and, in this instance, ensures that collective buffering is not done by the master node (cbe-control).
20100702
New version of CBE s/w installed in WIDAR production environment (in the late afternoon of Friday, July 2).