VEGAS Hardware Tests - 2015 Sep 23

Session Goals

Ray suggested a new bitmask for guppi_threads.conf to bind certain guppi_daq processes to particular cores. The new masks are
net_thread_mask=0x08 
net_thread_codd_mask=0x08 
null_thread_mask=0x01 
dedisp_thread_mask=0x01 
dedisp_ds_thread_mask=0x01 
psrfits_thread_mask=0x03 
rawdisk_thread_mask=0x03 

We will investigate if this new mask alleviates dropped packets in coherent modes.

Summary

We systematically tested 800 MHz x 64, 128, 256, and 512 channel coherent modes. We looked for dropped packets and correct subintegration lengths, and also kept an eye on core memory usage via htop on banks A and E. Multiple scans were run in each mode to look for consistent core loads, which would indicate that processes are being pinned as expected. We found that core loads were not always consistent from scan to scan, or bank to bank. For example, on some scans core 3 would carry a 100% load on Bank A, while core 5 would carry a 100% load on Bank E. Sometimes heavy loads would switching from one core to the next.

Packet loss was alleviated, however. The 64 channel mode still wrote anomalously short subints of 20.48e-6 us on Banks A, G, and H, before behaving as expected. But other banks in the 64 channel mode behaved as expected, and all banks performed well in 128 and 512 channel modes. The 256 channel mode still experienced significant packet loss.

Next Steps

Investigate why core loads are not more consistent. Are processes being pinned properly?

Addendum

Ryan and Ray investigated the process pinning in more detail in the simulator. Ray modified guppi_daq_server to name its threads to make it easier to identify them via their PID. We realized that net_thread was pinned properly, but that the dedisp threads was still allowed to float between cores. We believe this is because the dedisp threads were nominally pinned to core 0, which may be reserved by the system. When pinned to core 7 the dedisp thread had the proper affinity. We also noticed that the Manager was using 100% CPU and was not pinned to any core. We believe that this is because the Manager is writing the fits files (a psrfits_thread was not seen).
Topic revision: r1 - 2015-09-28, RyanLynch
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback