How Are Leap Seconds Handled?
This document attempts to outline some of the issues and details of handling leap-second events.
Green Bank time distribution is performed in a number of ways: 1 pulse-per-second signal (PPS), IRIG-B signals, and network time protocols (NTP) via Ethernet. NTP is by far the most common method, since it requires no additional cabling. In order to understand the reasons leap-seconds are problematic, a few words about how NTP works is required. (Please note this is a simplified description, refer to ntp.org for the reams of technical documents which describe the full details.
NTP time distribution has a concept of clock hierarchy, referred to as the stratum
. A computer directly connected to a traceable time source (like a synchronized atomic clock or GPS receiver) is defined to be the highest or 'stratum 1'. Computers which synchronize with a stratum 1 server (but have no local time source) are stratum 2.
This hierarchy is intended to represent the absolute precision of the time being distributed.
Here in Green Bank, our stratum-1 server is connected to both a 1 PPS signal and an IRIG signal (both originating from a hardware clock). Since a 1PPS signal only represents a 'tick', but not an absolute time, a second source is always required to resolve the ambiguity. Servers are usually configured with multiple sources, so that cross-checking and statistical methods may be used.
An example of the 'peers' output from our stratum-1 server:
remote refid st t when poll reach delay offset jitter
+IRIG_AUDIO(0) .IRIG. 0 l 41 64 377 0.000 -1.688 0.261
oPPS(0) .PPS. 0 l 3 16 377 0.000 -0.003 0.031
-ntp2.usno.navy. .USNO. 1 u 11 64 377 40.287 2.348 0.707
+navobs1.wustl.e .GPS. 1 u 43 64 377 64.953 -2.208 0.031
In this example the 'o' in front of the PPS means the time is being derived based on the PPS ticks, and the two lines begining with a '+' are the servers being considered for the absolute time reference. Note that one '+' is the IRIG device, the other is an external server.
To complete the example, here is the output from a stratum-3 workstation:
remote refid st t when poll reach delay offset jitter
+ntp2a.gb.nrao. 192.33.xxx.xxx 2 u 363 512 377 0.449 0.084 0.296
+ntp2b.gb.nra 192.33.xxx.xxx 2 u 270 512 377 0.320 -0.613 0.164
*ntp2c.gb.nrao 192.33.xxx.xxx 2 u 384 512 377 0.556 -0.237 0.046
LOCAL(0) .LOCL. 10 l 57 64 377 0.000 0.000 0.001
This indicates the workstation is querying three stratum-2 servers, and has selected ntp2c as the most stable.
Our hierarchy looks something like:
- ntp1a (from IRIG and PPS, with backup servers to NIST and/or USNO)
- ntp2a (from ntp1a and peered with other ntp-2's)
- workstations, and telescope devices (stratum 3's)
- time sensitive telescope devices (from PPS and/or IRIG) [making them stratum-1, but not usually used as servers]
After pouring over more ntp documentation, I found that ntpd is capable of using a leap-second file, and distributing the leap-information even if the source clock
does not implement the indication. This may or may not work, because it is unclear as to how the daemon handles its input. Anyway the documentation is available here
I should note that this requires the autokey feature to be configured on each and every system. I don't know if it is worth the trouble (but autokey has many other benefits).
So now back to leap-seconds ...
The way this is all supposed to work is that a server 'announces' the coming of a leap-second by setting a flag in the NTP time-packets, which indicates there will be a leap-second at the next midnight. At the new year transition, the servers will see the flag and appropriately handle the change without losing synchronization.
There are two problems here:
- The first is that the source from which our stratum-1 server gets time is IRIG-B, which has no mechanism to indicate a leap-second.
- The second is that we have no control over outside servers. I would hope that external high stratum servers would be configured correctly with the leap-second. (In 2008/2009 NIST and USNO servers had it correct.)
- We could try disconnecting IRIG from the stratum-1 server near the leap-second, this would force it to switch to another server which if we are lucky does the right thing. Since we know the leap-second will foobar our time anyway, it might be worth a try. (I think the flag will be set all day on the 31st, so we might search for a correctly configured server on that date.)
- Configure our stratum-1 ntp server with leap-second files. It may be necessary to somehow set the clock's leap indicator. (Most systems use the ntp 'shared memory clock'. I have custom versions which appropriately set the leap-second field based on the ntp leap-second configuration file.
- Use another local stratum-1 server with an 'enhanced' SHM clock which understands leap-seconds. Then peer this with our primary server. This should get the primary server to then set the leap-second flag in time distribution messages.
- Do nothing, just step time on the GPS clock (and IRIG) will produce a 1-second offset. It typically takes between 30-60 minutes for clocks to 'slew' to the new time.
What We Did in 6/2012:
Due to a storm which hit a large portion of the North-East US, commercial power was unavailable during the leap-second. So in essence, we did nothing. However, we did have a plan.
- Just prior to the leap-second, several of our servers were updated with a leap-second file. Documentation led me to believe that this would be sufficient to get the servers to set the leap flag field. However, it seemed that the clock actually needed to provide this.
- Servers were updated with new versions of shared memory clocks, which included a mechanism to read the leap-second file and set the leap field appropriately.
- One server (yed, a sparc-solaris system) normally a stratum-1 site server was too old to update. The plan was to slave this to another stratum-1 server (auriga) during the leap-second.
What We Did in 2008/2009:
- Since the systems group was out, semi-major changes to the site ntp configuration was not practical. Instead I unplugged the IRIG input feeding our stratum-1 server. This forced it to resync with an outside server which was correctly indicating a leap-second. The change propagated to all secondary servers and clients. (A side-effect was that we lost our PPS discipline, dropping it to a stratum-2 status.) Even still it remained the site reference.
- Frank was able to use the 'do a leap second at' feature of the clock, so even IRIG stepped correctly.
- Once the IRIG has been stepped, we reconnected IRIG to the stratum-1 server
- The CCU and SCU startup scripts need to be updated to change the year to 2009. ()
- By disconnecting IRIG, all systems were able to receive the leap notification. Workstations counted the 59th second twice.
- None of the servers lost sync. One oddity was that the USNO server which we were receiving the leap indication from did not tick correctly, resulting in a one second offset temporarily. Our server simply switched to another server (a NIST server) which did the step correctly.
- Most of the issues resulting were due to IRIG based vxWorks SBC's. They required a reboot to correct their time.
- I ended up restarting a bunch of the M&C processes while chasing a non-issue bug with the antenna monitor screen. (The CCU time was blank. As soon as the operator took control, the field was displayed correctly.)
- 31 Dec 2008