02/14/2012
“I saw a 1.2us peak offset today on the server running timekeeper “. We’re working on it. But this is on a heavily used network. There appears to be a problem with the network card. It’s a pessimistic estimate of the error. And this is a peak offset.
11/02/2011
TimeKeeper’s advanced algorithms can compensate for virtual machine difficulties in keeping accurate time.
The problem with a virtual machine as far as time synchronization goes is that the VM will suspend for seconds at a time while the underlying operating systems work on something else. As it becomes more common to migrate virtual machines within a cluster or even a larger data center in order to load balance, this problem gets worse. A VM waking up after say 5 seconds has passed will detect a huge gap between its current time and the time provided by the clock. After a day or two, it’s not unusual to see a VM that is seconds out of sync. But a VM running TimeKeeper will smoothly recover its footing rapidly and smoothly as the smoothing algorithms TK uses to compensate for network packet delays and oscillator variation produce a quick convergence with a time source. You can hook a VM running timekeeper up to an NTP source and expect pretty good time tracking. This is not something we were initially targeting - for financial trading software the random delays of VMs are not acceptable - but it’s an interesting side effect. If VMs are being used for large scale database processing or map-reduce, the stability of a TK validated timestamp may improve data reliability and also reduce locking overhead.
10/30/2011
Victor Yodaiken will be speaking at the STAC London performance summit and participating in a panel on Time Synchronization.
08/24/2011
FSMLabs will have representatives at The Linux in Finance Conference
Come talk with us about time, timestamping, TimeKeeper 5.0, and other upcoming FSMLabs innovations. Send us an email for an appointment.
08/15/2011
Time synchronization software is often too ready to believe whatever it is told. TimeKeeper is more skeptical—for good reason.
Most time synchronization software is, well, gullible, designed on the assumption that time sources and local clocks are reliable. TimeKeeper is more skeptical and can often compensate for bad data, but also goes to lengths to document any problems it sees. In the upcoming 5.0 release TimeKeeper will be be able to use sophisticated cross-checking to detect failed clocks, network congestion, or even deliberate hacking of GPS time and even in the current version it produces a log that serves as an audit trail.
The engineering team evaluating TimeKeeper Client for a prospective customer called us up to say that Timekeeper was reporting trouble locking on to a reference time and that other client software was not reporting errors. After examining the log, we asked them to check whether something was wrong with the configuration or operation of the network clock. The first discovery was that the time clock was on the other side of the ocean “in a test lab”. After some discussion, we got someone to check on it in person. And what he found was sauna-like heat in room with no air-conditioning and a pile of machines balanced on one another with no air-flow. Once the poor abused network clock was moved into a rack in an air-conditioned room, TimeKeeper was happy. It turned out that the other client software, the software that had not reported errors, had just been passing on bad time values without any warning.
Another one of our customers has a need for keeping timing error down in the submicrosecond range - using the NTP protocol. The computer racks they use for a critical trading application rely on an advanced feature of TimeKeeper Server to synthesize time from a “pulse per second” that is distributed from a GPS satellite radio “clock” combined with NTP time distributed over the network. The idea is that the NTP time should fix the second and the “pulse per second” can be used to get accuracy down to nanoseconds. The system has been rock steady in one installation, but when they added a second installation TimeKeeper would sometimes refuse to use the pulse-per-second, relying on a synthetic time, and would complain about bad syncs. It turned out that the NTP time was wobbling by a large fraction of second from true time because of a failure in the time clock. Replacing the time clock fixed the error.
Silent failure and undetected problems with time feeds or local oscillators pose a problem that is not yet widely appreciated in the automatic trading world. Imagine if the sauna had been on for a production system and client software had failed to detect the problem. Days or even weeks of trading could have been carried out on a seriously unstable time base without any indication. It’s possible that traders could have been tweaking algorithms to try to solve a problem that did not come from their algorithms at all. Or suppose that time was actually being correctly supplied to systems and something went wrong – how would a dispute over the cause of the problem be resolved if time synchronization software did not provide a reliable audit trail.
06/16/2011
FSMLabs has a rigorous, highly automated, test and regression system that is absolutely necessary to deliver high performance.

Time Synchronization solutions are hard to test and often are provided with “self-test” components that report numbers which border on wishful thinking. Usually, the time synchronization solution test component reports the “offset” from the correct time - or at least an estimate of that offset. But how accurate is that estimate? If you think about it, if the synchronizer really “knew” the offset, it could correct its time and reduce the
error to zero. Actually, our tests show that those self-reported numbers are often wildly off. What we do is build various test systems where the “synchronized time” can be compared in some way to an external reference - we want to compare actual time to computed time. For example, when TimeKeeper is trying to lock client system time to a reference time coming from some network time server that collects GPS time, we can run the “pulse per second” generated by that time server into a cable that connects to the client computer directly and then run special software that waits for a pulse and reads the “synchronized time” as the pulse shows up in real-time. TimeKeeper uses data that comes over the network connection, but our test software compares that to the hardware generated pulse-per-second. What you want to see is that the synchronized time is nearly exactly on a second boundary or perhaps a little past depending on how long the signal takes to propagate down the wire. These measurements allow us to both to evaluate our algorithms for time synchronization and our test tool.
There are three quantities tracked on this graph: “raw”, “offset”, and “local” (click on it for a larger view). “Raw” and “offset” are our estimates of error where the second one is smoothed out by the algorithm. “Localtest” is actual error - using the pulse-per-second hardware. As you can see, these times converge rapidly. “Raw” is the time TimeKeeper computes to be its instantaneous variance (error) from the reference time. The IEEE PTP protocol is designed for no-traffic networks that do not introduce much if any variation in transfer time, but in the real-world we cannot rely on having such networks. So TimeKeeper has some sophisticated algorithms to synthesize a correct time from both sources - and we keep a running calculation of what we think the worst case error may be. “Offset” is the error we compute in a further smoothed time that TimeKeeper reports to users - because we want to avoid rapid “corrections” whenever possible. And “localtest” is a time error computed from a GPS “pulse per second” signal run directly into the client for cross-check purposes. That is, “localtest” crosschecks the time TimeKeeper computes against the pulses directly generated by the GPS clock hardware. It’s important to note that not only do we converge on correct time, but we do so very quickly and then lock onto it.
06/14/2011
TimeKeeper on Infiniband.
FSMLabs has just validated TimeKeeper performance on two high frequency trading systems that are based on Infiniband - the super low-latency networking technology.. Performance was superb. To get Infiniband working in the test systems took about 15 minutes. In both sites, TimeKeeper was inserted into a working NTP system as both server and client and just worked. In one system the TimeKeeper Server was itself accepting time from a PTP master clock on an Ethernet network and acted as a bridge. In a second system, the TimeKeeper server ran on a device that contained a GPS time-clock PCI card.
The high performance and “no muss, no fuss” operation validates FSMLabs “drill down” approach to reconciling the conflicting requirements of standards and high performance. Our design approach relies on drilling down through layers of general purpose software to get raw hardware performance on highly optimized purpose built software for critical functions. Essentially we take on a big part of the effort of balancing standards against performance in our software design/implementation project - so it’s not a problem for over-worked IT staffs. IT departments want standardized hardware and software that is widely compatible with a large range of devices, applications, and programs. So a special purpose operating system or even one that has been modified to support a special API or for functionality, rapidly becomes a huge expense as it is adapted to support rapidly changing system compute servers, drivers, devices, and software. On the other hand, standard platforms have to be all things to all people and necessarily sacrifice some performance/reliability/security. You can’t get “general purpose” and “finely honed for purpose” in the same box. Well, you can, if you can bypass or override generic functions in just those places where you need specific performance. Doing that drill down while keeping the rest of the system safe is a pretty difficult technical play, but it pays big dividends. That’s why we get microsecond timing accuracy over Ethernet - and that’s why our Timekeeper software can leverage the Linux general purpose networking support and the Infiniband drivers to get submicrosecond accuracy over the rock steady Infiniband interconnect.
(photo from Chris Dag)
06/12/2011
TimeKeeper is being exhibited at SIFMA by several resellers. Drop by the Symmetricom or Spectracom booths and ask them about how TimeKeeper takes time “the last mile” to the application program.
or drop us an email![]()
x
03/02/2011
Matt Sherer, engineer on FSMLabs’ TimeKeeper product, writes:
Having highly accurate time available on the network is great. Getting that time to the application before it’s stale is even better.
Even highly accurate time can get pretty stale by the time it gets to the applications that need it. Lost syncs, network latencies, and other factors can accumulate. Even having local hardware that corrects perfectly for these factors is not a complete solution.
The trouble is, applications don’t run on that card with the accurate time - they’re running under a host operating system. Even if that card provides perfect time, the OS may be delayed in processing it. Or it could mangle the value in an attempt to satisfy conflicting clock requirements.
Within the OS, an application asking for the time on different processors may get different results. Just requesting time from the OS can add significant additional overhead in the application. The result here is that when your application returns with a time sample from the OS, that time may be stale or just wrong.
FSMLabs’ TimeKeeper can remove these ambiguities. TimeKeeper can deliver or consume time over a network (NTP or PTP), but let’s just look at how it can avoid timing ambiguities locally.
TimeKeeper converges on an accurate time and provides it throughout the entire system - not just to specialty applications or to the OS (which may again skew time). Any application or OS component on the system, when it asks for time, will be getting it from TimeKeeper, undiluted and undelayed.
Having a single source of trustworthy time provides a number of benefits:
* The OS is no longer providing different time values to different users depending on their clock.
* The OS has consistent timestamping - all internal state, from network sockets to filesystem updates, are all on a common time base.
* Time drift between processors is gone - every processor has the same time.
* Without per-processor skew, there’s no chance that an application migrating between processors will see time move backwards.
So, TimeKeeper gives the system accurate time directly, whether it’s serving the OS or applications on the OS. Applications can trust the time they get is accurate and not stale.
This is all well and good - but TimeKeeper actually speeds up the process of delivering time too.
It used to be that getting time from the OS involves a system call. That’s expensive - state has to be saved, the application may be switched out, housekeeping overhead may take time, the OS more code has to be run, and the application just isn’t getting real work done. Modern Linux versions have reduced that overhead, and some, like Red Hat Enterprise Linux 6, avoid the
system call entirely. Kernel support for a feature called VDSO allow specially designed services to provide data - like time - without system call overhead.
TimeKeeper leverages this support to further improve performance. Not only is TimeKeeper getting a more accurate and unified time to the entire system, it can actually speed the process of getting time to the caller. Before we show TimeKeeper’s numbers, here are the improvements in using VDSO over system calls in stock Red Hat Enterprise Linux 6:
| Function | Overhead improvement | Speedup |
|---|---|---|
| gettimeofday | 48ns | 54% |
| clock_gettime | 49ns | 60% |
If VDSO is supported, it is selected transparently over a system call. There are no source changes to be made in applications. As you can see, skipping that system call is very worthwhile - it cuts the time spent by more than 50%.
When TimeKeeper starts, if VDSO or vsyscalls are supported, it transparently provides the more accurate time in place of Linux’s data. Again, there are no source changes
needed. Applications don’t even need to be restarted. Once enabled, these numbers get even better:
| Function | TimeKeeper improvement | Additional speedup on RHEL6’s VDSO |
|---|---|---|
| gettimeofday | 9ns | 25% |
| clock_gettime | 10ns | 30% |
TimeKeeper never leaves the processor for data, and it doesn’t enter the OS - so it can take Red Hat 6’s improved performance, and still improve upon that by 30%. (In fact, TimeKeeper also provides an optional direct access function that can reduce overhead by another 15%.) Applications asking for time can have it, accurately, in less than 20 nanoseconds.
What does all this mean? Well, a number of things:
* Applications can trust the time they get from the OS isn’t stale, and doesn’t have built in inaccuracies.
* The whole system - OS and applications - can operate on the same time base without fear of internal drift or time going backwards. Knowing that data from different systems are tagged with the same time base is a huge benefit.
* Having faster access to time means that the application can spend fewer cycles accessing time, and more cycles doing real work. Or, it makes more cycles available to timestamp events that couldn’t be tagged before.
As a software client, TimeKeeper can get accurate time across your network, even where additional hardware is not an option. It can serve PTP (version 1 or 2) or NTP (versions 1-4) or act as a client, to get accurate time to systems that need it. We saw above that TimeKeeper solidly improves on how that time is driven all the way out to the application.
Is your application’s time data stale? Contact .(JavaScript must be enabled to view this email address) and we can quantify how TimeKeeper could improve your system.
01/20/2011
FSM’s Matt Sherer asks:
How important is time to your application? To many people, not very much - as long as they can reasonably trust that their system time is accurate to a few seconds, that’s good enough.
For them, a stock NTP client and server is sufficient - more than sufficient, actually, if it can get accuracy within under a second, or even within a few milliseconds.
There are those of us, though, that obsess over time. We (and our applications) need to know that when we ask the operating system for time, the time we get is the actual time right now. It can’t be 500 microseconds old, or even 100 microseconds old. 1 microsecond accuracy is getting a bit stale, actually. Sub-100 nanosecond accuracy is really what’s needed - whether it’s to satisfy regulatory requirements or to make sure that decisions being made are based on current reality and not the past.
Modern operating systems make it hard to get this kind of accuracy on their own. If you need accurate time, make sure to ask yourself these questions, and be very sure of your answers.
Proper testing methods can answer many of these questions, but unfortunately there’s rarely time to perform good tests. Given too much to do in too little time, it’s easy to think “We have PTP (or NTP) and that should solve our problem” or “We’ve got local timing hardware in the box, that means our application’s time is correct.”
The trouble is, even if you have PTP or NTP infrastructure, or if you have the space to put local timing hardware directly in the system, many of the above questions haven’t been answered.
Remember, though, that some of us are obsessed with time, both its distribution and accuracy. This obsession with correct time is the reason FSMLabs’ TimeKeeper software exists - and it answers all of the above questions, while also reducing overhead. It gets more accurate time directly to the application, faster than was possible before.
TimeKeeper can act as a client or server for NTP or PTP. It can take in a GPS or CDMA feed and redistribute it over the network, taking advantage of the latest in hardware timestamping features found in common network hardware. Time delivery is a topic for another article, though. For now, let’s assume the time data that gets to your system is perfectly accurate, whether it was delivered locally via GPS, PTP, etc. How does TimeKeeper help alleviate all of those issues raised above?
Let’s step through the questions one by one.
These are important questions to answer - all are real dangers in time management that can affect system performance and reportability. Testing can quantify how rampant these problems are in a given environment, if you’re given the time to do so. TimeKeeper can avoid the dangers and give your applications the correct time they can use to make the right decisions.