Never ending story: EMC CX4 & VAAI on vSphere 5

If you are using a CX4 model of EMC´s previous storage systems you certainly noticed that no CX4 model got listed for VAAI-support on VMware HCL in combination with vSphere 5.

After many people were searching a long time for a “Why?” in VMware & EMC forums, Chad Sakac (SVP EMC Global Systems Engineering, aka Virtual Geek) hosted a webinar to bring some clarification about this topic to EMC´s customer some weeks back.

A short summary: VMware raised the requirements between vSphere 4.1 and 5.0 for their VAAI tests a storage system needs to pass to get supported.  And like you may can already guess now: EMC systems of the CX4 series didn´t pass the tests. This also means you can´t expect that the CX4 will ever be supported for VAAI again (on >= vSphere 5). If you are interested in the full webinar you can find it here.

So far so good….after I noticed the thing with CX4 and VAAI last year I did some tests in our lab and couldn´t see any errors or problems with VAAI enabled and vSphere 5. After this setup was running a while, we didn´t disabled VAAI when we updated our productive environment from vSphere 4.1 to vSphere 5.0. Also I never saw any problems and VAAI worked like expected over more than half a year….until now.

I provisioned some new datastores to one of our vSphere clusters and everything went normally like always. The new datastores got mounted and I moved several VMs on it. Suddenly a storage vMotion failed as a result of there was no more space on the datastore.  I directly thought that there must be something serious wrong, because vSphere is validating if there is enough space on a datastore before you start a storage vMotion process (if not you can´t even start it). As a cross check I created a new VM on this datastore with a  Eager Zeroed vmdk. This job also failed with an unknown error. After I checked all the sizes (datastore size, total size of VMs which are already on this datastore, LUN size on CX4 etc.) the result was that there should be (much) more free space on the datastore.

I checked the vmkernel.log file on the ESXi host, on which the task got started and I found several entries containing “Possible sense data” errors. The associated SCSI sense code errors have been related to ATS / VAAI. After several tests I always run into the same scenario. In the next step I contacted the VMware support and after some long calls (3 hours+ !) with several support engineers and a lot of tests the suggestion was to turn off VAAI (that’s what I already was afraid before), because it´s not supported for the storage system I´m using (what a surprise) .

The end of this was that turning of VAAI solved my problems. Turning off VAAI on a ESXi host isn´t a big deal (just some advanced options via GUI or CLI) but in my case I also needed to modify each VMFS volume because all were running in the “public ATS-only” mode, which gets enabled by default on datastore creation if VAAI is enabled and the storage system also claims to support ATS. To make a downgrade to the “public” mode it is necessary that no ESXi host has an active connection to it, which means no powered on VM. By fact we are talking about a production environment in my case I had to move countless VMs via svMotion to a temp datastore and backward.

Some days after the support case got closed at VMware I found this new KB article:

When an ESX server discovers and registers new and existing devices the VAAI plugin module is loaded if the device is deemed to be VAAI capable during PSA device registration, in this particular case the ESX may fail to register device correctly with VAAI plugin module as we are not reading the capacity of the volume correctly to apply the equivalent VAAI pluging module. As a result the VMFS5 volume may fail to mount.

….

Note: EMC Clariion CX4 series arrays are not currently supported for VAAI Block Hardware Acceleration Support on vSphere ESXi 5.0. For more/related information, see EMC CX and VNX Firmware and ESX requirements for vStorage APIs for Array Integration (VAAI) support (2008822).

Like written above in my case I was able to mount the volumes but the capacity was wrong, even if displayed correct in GUI and CLI.

If you reading this, I really hope this case doesn´t apply to you because this wasn´t fun at all….

Speak Your Mind

*