Storage Informer
Storage Informer

VM Time Drift

by on Jun.25, 2009, under Storage

Virtualization and Performance: VM Time Drift

An important issue to be aware of when measuring application performance on a virtualized system is that of time drift.

A key responsibility of every VMM (hypervisor) is distributing clock ticks generated by the hardware to each VM (guest OS) running on the system. Likewise, it is the responsiblity of each VM to process that clock tick when it arrives so that system time is maintained in the expected fashion.

A problem arises when a VM, due to system load, isn&apost able to process clock ticks quickly enough. When this happens, ticks may be lost by the VM unwittingly.

This problem is significant in the performance analysis sphere. Imagine yourself measuring the time taken to process a constant workload in native and virtualized contexts and comparing the results. Since the work done is constant, the key metric to be measured is response time (completion time). Suppose response time in the native environment is 180 seconds. Then, you proceed to measure the response time of the same workload in the virtualized environment. The result turns out to be 150 seconds.

Huh? Could performance actually be BETTER in the virtualized environment?

Hold the phone, things are not what they appear. In a highly loaded virtualized system, you need to be careful with the problem of lost clock ticks. If a guest OS misses clock ticks for whatever reason, it will perceive time as passing slower than in actuality. For example, a guest OS with a 1 ms clock tick will expect 1000 clock ticks for a single second to pass. If 100 clock ticks are lost, then 1100 clock ticks are delivered before the VM considers 1 second to have elapsed. This is time drift. The result for our performance testing is that the compute job seems to complete earlier than it did in the native environment. This is because time has drifted but the hardware continues to run at its normal rate.

A nice solution to this problem is to run NTP (Network Time Protocol) on each VM. Using NTP, an OS will periodically query an authoritative NTP host to check the current time. It will then adjust system time to remain strictly synchronized with the host, regardless of its tick count. Thus, while time drift from lost clock ticks may impact time at a very fine-grained level, time will remain accurate on a coarse-grained level. Since our performance experiment uses coarse-grained timing, the mechanism does quite well at solving our performance measurement problem.

As a teaser for discussion, there are some other strategies for managing this problem. Your thoughts on the matter?

David Ott


:, , , , , , , , , , ,

Leave a Reply

Powered by WP Hashcash

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...