Mark6 SMART Data Utility

The m6modsmartdata.py utility will display the SMART data for the disks in a selected module. m6modsmartdata.py is deployed in /usr/bin of all mark6 units at the sites and the correlator, and the utility must be run locally on the unit. The utility can be run from vlbamon user on site mark6 units and from difx user on correlator mark6 units.

Usage:

usage : m6modsmartdata.py [options] <slot number>

options: -s - short output, show only serial number and error log

<slot number> must be 1, 2, 3 or 4

Example output, normal form, first disk only:

difx@mark6fx01 VLBADIFX7-2.5.2 ~> m6modsmartdata.py 1
=====================================================================
SMART data for module LBO%0086
=====================================================================
=====================================================================
DISK 0 Information Section
=====================================================================
Model Family:     HGST Ultrastar He10
Device Model:     HGST HUH721010ALE604
Serial Number:    2YKH788D
LU WWN Device Id: 5 000cca 273f1335b
Firmware Version: LHGNW384
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jan 16 14:36:39 2020 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=====================================================================
DISK 0 Attributes Section
=====================================================================
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       100
  3 Spin_Up_Time            0x0007   148   148   024    Pre-fail  Always       -       443 (Average 447)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       22
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1083
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       22
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       114
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       114
194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Min/Max 15/36)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0


=====================================================================
DISK 0 Error Log Section
=====================================================================
No Errors Logged

Example of a disk with an error:

=====================================================================
DISK 6 Information Section
=====================================================================
Device Model:     WDC WD8003FRYZ-01JPDB1
Serial Number:    7SJ4ULAW
LU WWN Device Id: 5 000cca 252de608f
Firmware Version: 01.01H02
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jan 16 14:40:05 2020 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=====================================================================
DISK 6 Attributes Section
=====================================================================
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       100
  3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       437 (Average 439)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       37
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       3162
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       37
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       328
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       328
194 Temperature_Celsius     0x0002   230   230   000    Old_age   Always       -       26 (Min/Max 14/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1


=====================================================================
DISK 6 Error Log Section
=====================================================================
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 2343 hours (97 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 78 e3 b7 40 00   4d+23:56:00.061  READ FPDMA QUEUED
  60 00 08 78 e4 b7 40 00   4d+23:56:00.060  READ FPDMA QUEUED
  60 00 08 78 e2 b7 40 00   4d+23:56:00.059  READ FPDMA QUEUED
  60 00 00 78 e1 b7 40 00   4d+23:56:00.059  READ FPDMA QUEUED
  60 00 08 78 e0 b7 40 00   4d+23:56:00.058  READ FPDMA QUEUED

Example of short form output for a 5 disk module:

difx@mark6fx01 VLBADIFX7-2.5.2 ~> m6modsmartdata.py -s 3
=====================================================================
SMART data for module JPLK%006
=====================================================================
=====================================================================
DISK 0 Information Section
=====================================================================
Serial Number:    7SJ7YAVW 

=====================================================================
DISK 0 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 1 Information Section
=====================================================================
Serial Number:    7SJ3573W 

=====================================================================
DISK 1 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 2 Information Section
=====================================================================
Serial Number:    7SJ56TLW 

=====================================================================
DISK 2 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 4 Information Section
=====================================================================
Serial Number:    7SJ2RA3W 

=====================================================================
DISK 4 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 6 Information Section
=====================================================================
Serial Number:    7SJ4ULAW 

=====================================================================
DISK 6 Error Log Section
=====================================================================
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 2343 hours (97 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 78 e3 b7 40 00   4d+23:56:00.061  READ FPDMA QUEUED
  60 00 08 78 e4 b7 40 00   4d+23:56:00.060  READ FPDMA QUEUED
  60 00 08 78 e2 b7 40 00   4d+23:56:00.059  READ FPDMA QUEUED
  60 00 00 78 e1 b7 40 00   4d+23:56:00.059  READ FPDMA QUEUED
  60 00 08 78 e0 b7 40 00   4d+23:56:00.058  READ FPDMA QUEUED

Remote Mark6 SMART Data Utility

The remotem6modsmartdata.py utility will display the SMART data for the disks in a selected module in a selected mark6 unit. remotem6modsmartdata.py is deployed in /usr/difx/bin. The utility can be run from difx user on any machine, such as gooey and swc000, that has ssh access to site and correlator mark6 units. Output is the same, both normal and short forms, as the locally run m6modsmartdata.py.

Usage:

Usage: remotem6modsmartdata.py [options] <unit code> <slot>

A program to show SMART data for the disks in a module
  in a given slot a given site or playback mark6 unit
options: -s - short output, show only serial number and error log
<unit code> is two letter vlba site code for site mark6 units or
            01 to 08 for playback mark6 units
<slot number> must be 1, 2, 3 or 4

Example of shortform output from ov-mark6-1 slot 2:

difx@swc000 VLBADIFX7-2.5.2 ~> remotem6modsmartdata.py -s ov 2
***************************************************************************
 National Radio Astronomy Observatory computing facilities are exclusively
  for the use of authorized personnel, who are expected to abide by the
    terms of the NRAO Computing Security and Computing Use Policies.
***************************************************************************
=====================================================================
SMART data for module LBO%0087
=====================================================================
=====================================================================
DISK 0 Information Section
=====================================================================
Serial Number:    2YKHBBED 

=====================================================================
DISK 0 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 1 Information Section
=====================================================================
Serial Number:    2YKH6BHD 

=====================================================================
DISK 1 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 2 Information Section
=====================================================================
Serial Number:    2YKGL7DD 

=====================================================================
DISK 2 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 4 Information Section
=====================================================================
Serial Number:    2YKH3PZD 

=====================================================================
DISK 4 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 6 Information Section
=====================================================================
Serial Number:    2YKGL6JD 

=====================================================================
DISK 6 Error Log Section
=====================================================================
No Errors Logged

Example of shortform output from mark6fx03 slot 2 (looks like LBO%0019 disk 4 could use some attention smile ):

difx@swc000 VLBADIFX7-2.5.2 ~> remotem6modsmartdata.py -s 03 2
***************************************************************************
 National Radio Astronomy Observatory computing facilities are exclusively
  for the use of authorized personnel, who are expected to abide by the
    terms of the NRAO Computing Security and Computing Use Policies.
***************************************************************************
=====================================================================
SMART data for module LBO%0019
=====================================================================
=====================================================================
DISK 0 Information Section
=====================================================================
Serial Number:    ZA15A643 

=====================================================================
DISK 0 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 1 Information Section
=====================================================================
Serial Number:    ZA17ZG7E 

=====================================================================
DISK 1 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 2 Information Section
=====================================================================
Serial Number:    ZA17ZMW2 

=====================================================================
DISK 2 Error Log Section
=====================================================================
No Errors Logged


=====================================================================
DISK 4 Information Section
=====================================================================
Serial Number:    ZA150DWL 

=====================================================================
DISK 4 Error Log Section
=====================================================================
ATA Error Count: 845 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 845 occurred at disk power-on lifetime: 3296 hours (137 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      14:51:08.425  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:51:08.425  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:51:06.083  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:51:03.750  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:51:02.667  READ FPDMA QUEUED

Error 844 occurred at disk power-on lifetime: 3296 hours (137 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      14:50:09.277  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:50:09.276  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:50:09.275  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      14:50:09.275  READ LOG EXT
  60 00 00 ff ff ff 4f 00      14:50:03.003  READ FPDMA QUEUED

Error 843 occurred at disk power-on lifetime: 3296 hours (137 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00      14:50:03.003  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:50:02.876  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:50:02.759  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:50:02.641  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:50:02.533  READ FPDMA QUEUED

Error 842 occurred at disk power-on lifetime: 3296 hours (137 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      14:47:54.370  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:47:54.370  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:47:53.270  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:47:50.920  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:47:49.845  READ FPDMA QUEUED

Error 841 occurred at disk power-on lifetime: 3296 hours (137 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      14:46:56.101  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      14:46:56.100  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:46:53.784  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:46:52.717  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      14:46:50.393  READ FPDMA QUEUED


=====================================================================
DISK 6 Information Section
=====================================================================
Serial Number:    ZA17ZMT8 

=====================================================================
DISK 6 Error Log Section
=====================================================================
No Errors Logged
Topic revision: r1 - 2020-01-16, MarkWainright
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback