REVISE: Time Synchronization in Windows Hyper-V

Below links give a detailed clarification on how time clock of guest operating systems in windows hyper-v works and how to solve its associated problems.

Question: Time Synchronization in Hyper-V

There is a lot of confusion about how time synchronization works in Hyper-V – so I wanted to take the time to sit down and write up all the details. 

There are actually multiple problems that exist around keeping time inside of virtual machines – and Hyper-V tackles these problems in different ways.

Problem #1 – Running virtual machines lose track of time.

While all computers contain a hardware clock (called the RTC – or real-time clock) most operating systems do not rely on this clock.  Instead they read the time from this clock once (when they boot) and then they use their own internal routines to calculate how much time has passed.

The problem is that these internal routines make assumptions about how the underlying hardware behaves (how frequently interrupts are delivered, etc…) and these assumptions do not account for the fact that things are different inside a virtual machine.  The fact that multiple virtual machines need to be scheduled to run on the same physical hardware invariably results in minor differences in these underlying systems.  The net result of this is that time appears to drift inside of virtual machines.

UPDATE 11/22: One thing that you should be aware of here: the rate at which the time in a virtual machine drifts is affected by the total system load of the Hyper-V server.  More virtual machines doing more stuff means time drifts faster.

In order to deal with time drift in a virtual machine – you need to have some process that regularly gets the real time from a trusted source and updates the time in a virtual machine.

Hyper-V provides the time synchronization integration services to do this for you.  The way it does this is by getting time readings from the management operating system and sending them over to the guest operating system.  Once inside the guest operating system – these time readings are then delivered to the Windows time keeping infrastructure in the form of an Windows time provider (you can read more about this here: http://msdn.microsoft.com/en-us/library/bb608215.aspx).   These time samples are correctly adjusted for any time zone difference between the management operating system and the guest operating system.

Problem #2 – Saved virtual machines / snapshots have the wrong time when they are restored.

When we restore a virtual machines from a saved state or from a snapshot we put back together the memory and run state of the guest operating system to exactly match what it was when the saved state / snapshot was taken.  This includes the time calculated by the guest operating system.  So if the snapshot was taken one month ago – the time and date will report that it is still one month ago.

Interestingly enough, at this point in time we will be reporting the correct (with some caveats) time in the systems RTC.  But unfortunately the guest operating system has no idea that anything significant has happened – so it does not know to go and check the RTC and instead continues with its own internally calculated time.

To deal with this the Hyper-V time synchronization integration service detects whenever it has come back from a saved state or snapshot, and corrects the time.  It does this by issuing a time change request through the normal user mode interfaces provided by Windows.  The effect of this is that it looks just like the user sat down and changed the time manually.  This method also correctly adjusts for time zone differences between the management operating system and the guest operating system.

Problem #3 – There is no correct “RTC value” when a virtual machine is started

As I have mentioned – physical computers have a RTC that operating systems look at when they first boot to get the time.  This real-time clock is backed by a small battery (you have probably seen the battery yourself if you have ever pulled apart a computer).  Unfortunately virtual machines do not have any “batteries”.  When a virtual machine is turned off there is no component that keeps track of time for it.  Instead – whenever you start a virtual machine we take the time from the management operating system and put this into the real-time clock of the virtual machine.

This is done without the use of the Hyper-V time synchronization integration servers (it happens long before the integration services have loaded). 

The downside of this approach is that this does not take into account any potential time zone differences between the management operating system and the guest operating system.  The reason for this is that “time zones” are a construct of the software that runs in a virtual machine – and is not communicated to the virtual hardware in any way.  So – in short – when we start a virtual machine there is no way for us to know what time zone the guest operating system believes it is in.

One partial mitigation we have for this issue is that when the Hyper-V time synchronization component loads for the first time – it does an initial user mode set of the time to ensure that the time gets corrected as quickly as possible (using the same technique as discussed in problem #2).

So now that you understand how this all works – let’s discuss some common issues and questions around virtual machines and time synchronization.

Question #1 – I have a virtual machine that is configured for a different time zone to the management operating system.  Should I disable the time synchronization component of Hyper-V?

No, no, no, no, no, no, no.  And I say again – no.  As I have mentioned above – all time synchronization that is done by the Hyper-V time synchronization integration service is time zone aware.  If you disable the Hyper-V time synchronization integration service you will disable all the time synchronization aspects of Hyper-V that are time zone aware – and only leave the initial RTC synchronization active – which is not time zone aware.

This means that your virtual machines will go from booting in the wrong time zone, and then being corrected as soon as the Hyper-V time synchronization integration service loads to booting in the wrong time zone and staying in the wrong time zone.

Question #2 – Is there any way that I can stop Hyper-V from putting the wrong time in the RTC at boot?

In short; no.  We need to put something in there – and that is the best thing that we have to work with.

Question #3 – Can’t you use UTC time in the RTC so that the correct time is established when the virtual machine boots?

UTC (which is the computer techy version of saying GMT) time would solve this problem nicely with only one problem.  Windows does not support UTC time in the BIOS (Linux does).  So while this would solve the problem for our Linux running user base – the fact of the matter is that most of our users run Windows – and this would not work for them.

Question #4 – What about if I am using a different time synchronization source (e.g. domain time or a remote time server)?

Hyper-V time synchronization was designed to “get along well” with other time synchronization sources.  You should not need to disable Hyper-V time synchronization in order to use a different time synchronization source – as long as it goes through the Windows time synchronization infrastructure.

In fact – if you are running a Domain Controller inside a virtual machine I would recommend that you leave Hyper-V time synchronization enabled but that you also setup an external time source.  You can do this by going to this KB article: http://support.microsoft.com/kb/816042and following the steps outlined in the “Configuring the Windows Time service to use an external time source” section.

UPDATE 11/22: I should have mentioned: since virtual machines tend to lose time much faster than physical computer, you need to configure any external time source to be checked frequently.  Once every 15 minutes is a good place to start.

Question #5 – How can I check what time source is being used by Windows inside of a virtual machine?

This is easy to do.  Just open an administrative command prompt and run “w32tm /query /source”.  If you are synchronizing with a remote computer – its name should be listed.  If you are using the Hyper-V time synchronization integration service you should see the following output:

image

If you see this output:

image

It means that there is no time synchronization going on for this virtual machine.  This is a very bad thing – as time will drift inside of the virtual machine.

Question #6 – Wait a minute!  My virtual machine should be synchronizing to the domain (or an external server) – but when I run that command it tells me that the Hyper-V time synchronization provider is being used!  How do I fix this!

I do not know why this happens – but sometimes it happens.  The first thing that you should do is to check that your domain does have a correctly configured authoritative time source.  There have been a small number of times when I have seen this problem being caused by the lack of an authoritative time source.

Alternatively – you can “partially disable” Hyper-V time synchronization.  The reason why I say “partially disable” is that you do not want to turn off the aspects of Hyper-V time synchronization that fix the time after a virtual machine has booted for the first time, or after the virtual machine comes back from a saved state.  No other time synchronization source can address these scenarios elegantly.

Luckily – there is a way to leave this functionality intact but still ensure that the day to day time synchronization is conducted by an external time source.  The key thing trick here is that it is possible to disable the Hyper-V time synchronization provider in the Windows time synchronization infrastructure – while still leaving the service running and enabled under Hyper-V.

To do this you will need to log into the virtual machine, open an administrative command prompt and run the following commands:

reg add HKLM\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\VMICTimeProvider /v Enabled /t reg_dword /d 0

This command stops W32Time from using the Hyper-V time synchronization integration service for moment-to-moment synchronization.  Remember from earlier in this post that we do not go through the Windows time synchronization infrastructure to correct the time in the event of virtual machine boot / restore from saved state or snapshot.  So those operations are unaffected.

w32tm /config /syncfromflags:DOMHIER /update

This command tells Windows to go and look for the best time source in the domain hierarchy.  If you want to use an external time server instead you can use the commands found here: http://technet.microsoft.com/en-us/library/cc784553(WS.10).aspx

net stop w32time & net start w32time

w32tm /resync /force

These two commands just “kick the Windows time service” to make sure the settings changes take effect immediately.

w32tm /query /source

This final command should confirm that everything is working as expected.

When you run these commands you should see something like this:

image

Question #7 – I have a virtual machine that has gotten ahead of time, and it never gets corrected back to the correct time.  What is going on here?

As a general rule of thumb, when time drifts inside a virtual machine it runs slower than in the real world, and the time falls behind.  We will always detect and correct this.

However, in the past, we have had reports of software problems caused when the Hyper-V time synchronization integration service decides to adjust the time back – because it believes the virtual machine is ahead of time.  To deal with this (rare) issue – we put logic in our integration service that will not change the time if the virtual machine is more than 5 seconds ahead of the physical computer.

UPDATE 11/22: I was asked how having the virtual machine in a different time zone to the Hyper-V server would affect this.  The short answer is that it does not.  The 5 second check is done after we have done the necessary time zone translation.

Question #8 – When should I disable the Hyper-V time synchronization service (either in the virtual machine settings, or inside the guest operating system)?

Never.

There are definitely times when you will want to augment the functionality of the Hyper-V time integration services with a remote time source (be it a domain source or an external time server) but the only way to get the best experience around virtual machine boot / restore operations is to leave the Hyper-V time integration services enabled.

Question: Hyper-V, CPU Load and System Clock Drift

Using Hyper-V Server, you may find that the time is drifting a lot from the actual time, especially when Guest Virtual Machines are using CPUs heavily. The host OS is also virtualized, which means that the load of the host is also making the clock drift.

How to prevent the clock from drifting

  1. Disable the Time Synchronization in the Integration Services. (Warning, this setting is defined per snapshot)
  2. Import the following registry file : 

    Windows Registry Editor Version 5.00[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\W32Time\Config] 
    “MaxAllowedPhaseOffset”=dword:00000001 
    “SpecialPollInterval”=dword:00000005 
    “SpecialInterval”=dword:00000001
    Note: If you are using notepad, make sure to save the file using an Unicode encoding.
  3. If the guest OS (and the server OS) is not on a domain, type in the following to set the time source : 

    w32tm /config /manualpeerlist:”time.windows.com,0x01 1.ca.pool.ntp.org,0x012.ca.pool.ntp.org,0x01″ /syncfromflags:MANUAL /update Note: Hosts FQDN are separated by spaces. 
  4. Run the following command to force a time synchronization 

    w32tm /resync
  5. Check that the clock is not drifting anymore by using this command : 

    w32tm /monitor /computer:time.windows.com

A bit of background …

The system I’m currently working on is heavily based on time. It relies a lot on timestamps taken from various steps of the process. If for some reason the system clock is unstable, that means the data generated by the system is unreliable. It sometimes generates corrupt data, and this is not good for the business.

I was investigating a sequence of events stored in the database in order that could not have happened, because the code cannot generate it this way.

After loads of investigation looking for code issues, I stumbled upon something rather odd in my application logs, considering that each line from the same thread should be time stamped later than the previous :

2009-10-13T17:15:26.541T [INFO][][7] … 
2009-10-13T17:15:26.556T [INFO][][7] … 
2009-10-13T17:15:24.203T [INFO][][7] … 
2009-10-13T17:15:24.219T [INFO][][7] … 
2009-10-13T17:15:24.234T [INFO][][7] …

All the lines above were generated from the same thread, which means that the system time changed radically between the second and the third line. From the application point of view, the time went backward of about two seconds and that also means that during that two seconds, there were data generated in the future. This is not very good…

The Investigation

Looking at the Log4net source code, I confirmed that the time is grabbed using System.DateTime.Now call, which excludes any code issues.

Then I looked at the Windows Time Service utility, and by running the following command :

w32tm /stripchart /computer:time.windows.com

I found out that the time difference from the NTP source was very different, something like 10 seconds. But the most disturbing was not the time difference itself, but the evolution of that time difference.

Depending on the load of the virtual machine, the difference would grow very large, up to a second behind in less than a minute. Both the host and the guest machines were exposing this behavior. Since Hyper-V Integration Services are by default synchronizing the clock of all the virtual machines on the guest OS, that means that the load of a single virtual machine can influence the clock of all other virtual machines. The host machine CPU load can also influence the overall clock rate, because it is also virtualized.

Trying to explain this behavior

To try and make an educated guess, the time source used by windows seems to be the TSC of the processor (by the use of the RDTSC opcode), which is virtualized. The preemption of the CPU by other virtual machines seems to have an negative effect on the counter used as a reference by windows.

The more the CPU is preempted, the more the counter drifts.

Correcting the drift

By default, the Time Service has a “phase adjustment” process that slows down or speeds up the system clock rate to match a reliable time source. The TSC counter on the physical CPU is clocked by the system Quartz (If it is still like this). The “normal” drift of that kind of component is generally not very important, and may be related to external factors like the temperature of the room. The time service can deal with that kind of slow drift.

But the default configuration does not seem to be a good fit for a time source that drifts this quickly and is rather unpredictable. We need to shorten the process of phase adjustment.

Fixing this drift is rather simple, the Time Service needs to correct the clock rate more frequently, to cope with the load of the virtual machines that slow down the clock of the host.

Unfortunately, the default parameters on Hyper-V Server R2 are those of the default member of a domain, which are defined here. The default polling period from a reliable time source is way too long, 3600 seconds, considering the drift faced by the host clock.

A few parameters need to be adjusted in the registry for the clock to stay synchronized :

  • Set the SpecialInterval value to 0x1 to force the use of SpecialPollInterval.
  • Set SpecialPollInterval to 10, to force the source NTP to be polled every 10 seconds.
  • Set the MaxAllowedPhaseOffset to 1, to force the maximum drift to 1 second before the clock is set directly, if adjusting the clock rate failed.

Using these parameters will not mean that the clock will stay perfectly stable, but at the very least it will correct itself very quickly.

It seems that there is a hidden boot.ini parameter for Windows 2003, /USEPMTIMER, which forces windows to use the ACPI timer and avoid that kind of drift. I have not been able to confirm this has any effect at all, and I cannot confirm if the OS is actually using the PM Timer or the TSC.