Testing Round 2

Began: 01/13/2022 1:00 PM
End: 01/13/2022 3:00 PM

Project: TINT_20220113

JIRA Links: Kasey

Objective: Evaluate new GPU configuration on vegas-hpc21 using dspsr and guppi daq



Pre-Setup Notes

notes from slack

vegas-hpc21 needs reboot since it now has two GPU cards.


other issue

not seeing machine in system.conf
  • subsitute hpc18 with hpc21
  • put in mac address with 21
  • use task master to startup
  • need to start and stop coordinator




Steps to setup

sequence for going from manager running on hpc18 to hpc21:

  1. edit config files: system.conf - change hostname for bank H. vegas.conf - add MAC address for hpc21
  2. TM stop manager and vegas_matrix_server on hpc18
  3. edit vegas-hpc21Proc.conf to run Bank H manager instead of Bank I manager
  4. ssh hpc21; source /home/gbt/gbt.bash; TM vegas-hpc21 systemstart
  5. restart vegasCoordinator on hpc11 via TM
  6. restart any CLEO screens related to VEGAS

Can see that BankH is getting set up correctly

cb6a12c3cfd94d7cb600ecd6595b882d.png



Notes from Testing

example of Joe's Profiler (more notes below)

7dba189f761e4a809f7d192bd9b750e9.png

Timing and Testing Notes

6980325c84644b9388cf2382303fb5e3.jpg

75a6ece18885449893670461f476e744.jpg

Switching Back Steps

01/13/2022 2:37 PM

Same process as the setup steps but in reverse:
  1. remove mac address from vegas.conf
  2. edit host back to hpc18 in system.conf
  3. system stop on vegas-hpc21
  4. systemstart mgr on vegas-hpc18
  5. edit the vegas-hpc21 proc file to go back to original BankI
  6. restart VEGASCoordinator (stop 10, start 10)
  7. restart CLEO screens
  8. check devExp can see bankHMgr running
*check timing notes for more info on the sanity check scans*



Data from Testing

Power Monitoring

measpwr for VEGASBankA

http://grafana.gb.nrao.edu/dashboard/snapshot/uyZXEZNujcVhshxKE7pOKrU5I3QaMiMe

b23319580db948f5bce3859597a62b04.png

node_exporter

memory utlization for hpc21 GPUs

http://grafana.gb.nrao.edu/dashboard/snapshot/zh8KFG90OD3YYzV0R1Aav9XCK42T9R1g

e58f0cf6d5b34e5c97a29bb9e5139cbf.png

memory utilization for all hpc GPUS

http://grafana.gb.nrao.edu/dashboard/snapshot/7mRIcNE1lBHh4geSVcz2470GNzk7K1Je

a86c38607a7342a5b9215765abab819b.png

GPU

percent memory used for hpc21 GPUs

http://grafana.gb.nrao.edu/dashboard/snapshot/k1HUxoFx24fHkFeaanpylhZC0Opa7p4m

629a50244ad54f31a8700f115c703e7f.png

gpu temp for hpc21

http://grafana.gb.nrao.edu/dashboard/snapshot/VIhphvcPic9havcivp1EGw5AAbiIZSHs

b5d8965904264f6d88ed723b10385fc3.png

Joe's Profiling Notes

gpu_profiling.pdf

-- KathlynPurcell - 2022-01-19
Topic revision: r1 - 2022-01-19, KathlynPurcell
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback