Windows 2000 Server on vSphere 5 (Update)

A while ago we upgraded our server from vSphere 4.1 to vSphere 5.0. Before upgrading our infrastucture we did a lot of testing in a lab environment of course. Still we run into a problem when upgrading the productiv envrionment:

I know we are having the year 2012! But still my company is running a lot of Windows 2000 Server VMs with terminal services running (~40 VMs).

After starting to migrate the first Win2k VMs to vSphere 5 hosts, I noticed that the CPU load of the VM went up to 100% and didn´t came down anymore. During figuring out what the problem could be a colleague came quite upset to me, asking if something with the terminal sever is wrong, because first customer start complaining.

At this point of course we stoped the migration of the Win2k VMs. Gladly I did a snapshot, which I needed to get back to HW-version 7, before a vMotion back to a vSphere 4.1 host without much effort was possible.

To start troubleshooting wasn´t so easy, because there have been so many new “things”: ESXi 5.0, HW-version 8, new VMware Tools etc. Also I couldnt see any differnece to our lab, where we of course run Win2k on vSphere 5.0 before doining this with the productiv VMs.

Getting back to the lab didn´t bring up a result on the first look. Win2k VM starts fine, CPU load stays quite low, a hand full parallel RDP connections succeeded etc. Also “simulating a real customer” (starting applications on the TS etc.) couldn´t reproduce the error at this time.  After thinking a wihle about it my only idea was to extend the testing with a lot of RDP connections. Some colleagues got involved to “play customer” and started to connect to the terminal server and starting some applications etc. And here we go….

….after around 10 multiple active connections the VM starts going crazy. The CPU load increased to 100% and the network response got so damn bad that some of the sessions got disconnected. Step by step I did a lot of testing with, HW-version, Windows HCL seeting etc., all without success.

Meantime searching the net i came through a new topic  at VMware community, where someone descripted exactly the same problem. Some days later this thread got quite long with more and more people reporting the same, but all without a resolution.

Some weeks and over 1000 views later a VMware employee joined the discussion:

…I cannot say with certainty whether the cases reported in this
discussion are related, but they might be. If they are, one
possible way to overcome most of the slowdown might be to switch
from BT to either HWMMU or HV monitor mode.

[General background. We have three ways of running guests. These
so-called “monitor modes” can be selected in the UI. We generally
refer to them as BT (for binary translation with software MMU),
HV (for hardware virtualization, i.e., VT-x or AMD-V with software
MMU) and HWMMU (i.e., VT-x with EPT or AMD-V with RVI. You can
find more details here:
http://www.vmware.com/resources/techresources/10036].
So my suggestion/ask is: if you can confirm that the cases where
this problem is seen are using BT mode to run the guest, please
try either HWMMU or HV mode and let us know if things improve?
This, of course, assumes that you are running on a sufficiently
new physical CPU for these modes to be available….

After reading this post I directly switched a VM in the lab to HWMMU and started testing again. And wow….that´s it! After some more testing I dared to move a productiv VM to a Version 5.0 host and was successfull.

In the meantime of course we moved on with the vSphere 5.0 upgrade but kept some 4.1 hosts and some VMFS 3 datastore (all just for the Win2k VMs), so I was quite happy after upgrading the last 4.1 host to 5.0.

Below you see a screenshot of the seeting you need to adjust:

Some weeks ago VMware posted a KB article, which you can find here: http://kb.vmware.com/kb/2012205

Even if this change let Win2k VMs run fine on a vSphere 5.0 host one disadavantage remains: Switching from software MMU to HWMMU changes the memory page sizes. If I remember correct (I just can´t find the article where I read this…) the page size grows from 4 or 8 KB to 2MB. If you have a lot of (nearly) the same VMs, TPS (transparent memory sharing) is a really cool feature . In my case I saved around 50GB RAM in total with TPS on these Win2k VMs. The huge change of the page size makes it really hard for TPS to find identical pages (2MB!). So I went down from ~50GB RAM saving to only ~2GB.

I am looking forward that all these Win2k VMs get redeemed by Win2k8, but thats not really under my responsibility….

Update 09.04.2012:

It seems like this bug got resolved with the new vSphere 5.0 Update 1, which got released some weeks ago.  Unfortunately I didn´t find the time to review this by myself.

Comments

  1. Poul Grønlund says:

    Thank you very much for this article!

  2. How about VMware tools 9 on Windows 2000 Server?, were you able to install it?
    I can’t, still searching for a solution.

  3. I´m happy I haven´t seen a Win2000 VM on my last customer projects 🙂

    Thanks for the feedback! I guess some day I will have to deal with a Win2000 VM again 🙂

Speak Your Mind

*