-
Notifications
You must be signed in to change notification settings - Fork 3
/
HOWTO
177 lines (154 loc) · 9.2 KB
/
HOWTO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
The goal of ntpclient is not only to set your computer's clock
right once, but keep it there.
First, a note on typical 1990's and 2000's computer crystals. They
are truly pathetic. A "real" crystal oscillator (TCXO) usually has
an initial set error of less than 5 ppm, and variation over time, voltage,
and temperature measured in tenths of a ppm (and an OCXO can reach ±0.3 ppm
stability over ten years and 85°C temperature swing). The devices used
in conventional PC motherboards and single board computers, however,
often have initial set errors up to 150 ppm, and will vary 5 ppm over
the course of a day-night cycle in a pseudo-air-conditioned space.
[Operating system software can sometimes exacerbate the problem. I
have seen some i686 Red Hat 7.3 systems run the clock at 512 Hz, or 953
microseconds per tick, giving a built in 64 ppm error. Even the normally
exemplary DEC Alpha has, when run with Linux, a truly awful calibration
scheme; Linux runs it with a nominal ticks per second of 1024, which
gives a tick value of 977, theoretical additional error -448 ppm, actual
frequency observed -443.7 ppm.]
Still, the pattern is clear: the first and largest error of a crystal
is its initial set error. I strongly urge the calibration of each computer,
and storing its frequency error in a non-volatile medium, before you
do anything else with time setting and locking. While you could do it
in a few seconds using an accurate frequency counter, below I show a
software-only method using ntpclient and a high quality NTP server.
To perform the activities described, you need a way to control and monitor
your system's clock -- both its frequency and value. On Linux, the
kernel API is described in adjtimex(2). There are two programs that
I know of that provide shell-level access to this interface, both called
adjtimex(1).
One is written by Steven Dick and Jim Van Zandt, see the adjtimex* files in
http://metalab.unc.edu/pub/Linux/system/admin/time/
It uses long options, and includes some interesting functionality beyond
the basic exposure of adjtimex(2).
I (Larry Doolittle) wrote the other; it uses short options, and has no
bloat^H^H^H^H^Hextra features. I include the code here for a standalone
version; it is also incorporated into busybox (http://www.busybox.net),
although you may have to select it at compile time, like any other component.
Fortunately (and not coincidentally) the core functions of the two adjtimex
programs can be used interchangeably, as long as you only use the short option
variant of the Dick/Van Zandt adjtimex. The options discussed here are:
-f frequency (integer kernel units)
-o time offset in microseconds
-t kernel tick (microseconds per jiffy)
First, set the time approximately right, as root:
ntpclient -s -h $NTPHOST
You should see a single line printed like
36765 4980.373 1341.0 39.7 956761.4 839.2 0
Get used to this line: column headers are
1. day since 1900
2. seconds since midnight
3. elapsed time for NTP transaction (microseconds)
4. internal server delay (microseconds)
5. clock difference between your computer and the NTP server (microseconds)
6. dispersion reported by server (microseconds)
7. your computer's adjtimex frequency (ppm * 65536)
So in the example above, your computer's clock was a bit more than
0.95 seconds fast, compared to the clock on $NTPHOST.
Now check that the clock setting worked.
ntpclient -c 1 -h $NTPHOST
36765 4993.512 1345.0 40.9 3615.3 839.2 0
So now the time difference is only a few milliseconds.
On to measure the frequency calibration for your system.
If you're in a hurry, it's OK to only spend 20 minutes on this step.
ntpclient -i 60 -c 20 -h $NTPHOST >$(hostname).ntp.log &
Otherwise, you will learn much more about your system and its communication
with the NTP server by letting the log run for 24 hours.
ntpclient -i 300 -c 288 -h $NTPHOST >$(hostname).ntp.log &
Things to watch for in the above log:
If the last column (kernel frequency fine tune) ever changes, you haven't
turned off other time adjustment programs. AFAIK the only programs around
that would move this number are ntpclient and xntpd. On most out-of-the-box
systems, that last column should start zero and stay zero.
Use gnuplot to plot the resulting file as follows:
plot "HOSTNAME.ntp.log" using (($1-36765)*86400+$2):5:($3+$6) with yerrorbars
This shows time error (microseconds) as a function of elapsed time (seconds).
The error bars show the uncertainty in the measurement. Ideally, it would
be a smooth, straight line, where the slope represents the frequency error
of your crystal.
If an occasional point is both off-center and has a large error bar, it shows
a transaction got delayed somewhere in the process, either inside the server,
or one of the two UDP packet propagation steps. This is normal, and ntpclient
can deal with those quite well. If points are not evenly spaced on the
horizontal axis, packets were actually lost; this is less common, but still OK.
If the error bar becomes suddenly large, and takes a few minutes to slowly
recover, your NTP host (presumably xntpd) had problems communicating with
_its_ server, and reported that problem to you by increasing its "dispersion"
(this is a hack, required by xntpd's core incorrect assumption that errors
in network delays have Gaussian statistics; ntpclient does not have this flaw).
If there are sudden large, persistent steps in error, some other program is
making step changes to time. Check for, e.g., ntpdate run as a cron job.
If your client machine is OK, check for problems on the _host_ machine.
Assuming the graph above is clean, and has non-garbled data for the first
and last points, you can run it through the enclosed awk script (rate.awk)
to determine the appropriate frequency value.
$ awk -f rate.awk <test.dat
delta-t 119400 seconds
delta-o -142308 useconds
slope -1.19186 ppm
old frequency -1240000 ( -18.9209 ppm)
new frequency -1318109 ( -20.1127 ppm)
$
For now, you should plug in the new frequency value
adjtimex -f -1318109
Then reset the clock
ntpclient -s -h $NTPHOST
and ponder how it makes sense in _your_ (possibly embedded) environment
to have the number -1318109 applied via adjtimex every time your machine
boots. Or, simpler still, combine these two steps using a post-2005 version:
ntpclient -f -1318109 -s -h $NTPHOST
If the frequency offset (absolute value) is greater than about 230 ppm
(15073280), you have a problem: you may be able to fix it with the -t
option to adjtimex, or you need to hack phaselock.c, that has a
maximum adjustment extent of +/- 250 ppm built into phaselock.c (change
the #define MAX_CORRECT and rebuild ntpclient). I'd like to suggest that
you replace the defective crystal instead, but I understand that is rarely
practical.
On to ntpclient -l. This is actually easy, if you performed and understood
the previous steps. Run
ntpclient -l -h $NTPHOST
in the background. It will make small (probably less than 3 ppm) adjustments
to the system frequency to keep the clocks locked. Typical performance over
Ethernet (even through a few routers) is a worst case error of +/- 10 ms.
I won't try to tell you _where_ to put the boot time commands. They should
boil down to:
adjtimex -f $NONVOLATILE_MEMORY_VALUE
ntpclient -s -i 15 -g 10000 -h $NTPHOST
ntpclient -l -h $NTPHOST >some_log_file
The second line makes explicit the retries that may be required for this
UDP-based time protocol. If the first time request takes longer than 10000
microseconds to resolve, or the packets get lost, it instructs ntpclient to
try again 15 seconds later (the minimum retry period mandated by RFC-4330),
and it won't exit until it gets such a suitable response.
As of 2006, ntpclient can in theory combine the three lines above into one:
ntpclient -f $NONVOLATILE_MEMORY_VALUE -s -l -i 600 -g 10000 -h $NTPHOST >some_log_file
This can streamline the startup process, since you may be able to avoid a
layer of shell scripting. On the other hand, it is less tested, and there
is no (current) means to independently set the packet interval for the
set and lock phases.
It's an interesting question how sensitive the boot process should be
to the time set process. If you have a battery backed hardware clock,
there's not much problem running for a while without a network-accurate
system clock. In that case you could put both ntpclient commands into a
background script, and the only possible issue is the sudden (but probably
small) warp of the clock at the indefinite time in the boot sequence when
ntpclient gets its acceptable answer. On the other hand, some embedded
computers have no clue what time it is until the network responds. Any
files created will be marked Jan 1 1970, and other application-dependent
issues may arise if there is a nonsense time on the system during later
parts of the boot sequence. Then you may well want to enforce completion
of the first ntpclient before starting your application. If this is too
drastic for you, and you want a fallback mode when the time server is dead,
add a "-c 5" switch to the end of that ntpclient command, giving at most 5
retries, if something goes wrong with the time set. For that approach to be
useful, consider patching the source to lower the minimum packet send
interval from the RFC-4330-mandated 15 seconds.