Those who are considering virtualizing production Exchange 2010 servers should read the fine print contained in the TechNet article “Exchange 2010 System Requirements“. In particular, this text is crucial:
“Some hypervisors include features for taking snapshots of virtual machines. Virtual machine snapshots capture the state of a virtual machine while it’s running. This feature enables you to take multiple snapshots of a virtual machine and then revert the virtual machine to any of the previous states by applying a snapshot to the virtual machine. However, virtual machine snapshots aren’t application aware, and using them can have unintended and unexpected consequences for a server application that maintains state data, such as Exchange. As a result, making virtual machine snapshots of an Exchange guest virtual machine isn’t supported.”
Eeek! In a nutshell, this means that all support bets are off if you take a snapshot of a running Exchange 2010 server with VMware or Hyper-V and then attempt to revert to the state of the server contained in the snapshot. Don’t expect sympathy from Microsoft support if you ring up to report that things don’t work so well after you’ve used a snapshot to go back to a known system configuration.
In practice, snapshots are fantastic in a lab environment as they allow you to deploy Exchange servers quickly and to go back to a known state if the need arises (you assume that more errors occur in a lab environment that might cause a server to become unusable). In production, snapshots can work pretty well for Exchange 2010 servers that are largely stateless. If you have dedicated CAS or Hub Transport servers, you’ll probably not run into many difficulties if you need to revert to a snapshot of a previous configuration. You might screw up the transport dumpster a tad, but you won’t notice this unless you run into a more serious problem and require Exchange to replay some messages that should be in the dumpster… if the messages aren’t there, you might lose them unless they can be found in another dumpster.
Things are far more problematic with mailbox servers, especially those that operate within a Database Availability Group (DAG). These servers are super-stateful and may be communicating in all manner of mysterious ways, including block-mode replication. Because this is the case, it’s extremely likely that reverting to a previous snapshot of a running and loaded mailbox server will be a sorrowful event. You might run into problems such as the database copies on the server being unrecognized within the DAG, being forced to reseed database copies, or even having the server fail to rejoin the domain or cluster for one reason or another (expired computer password, etc.). All in all, it’s a messy place to be.
Because of the potential for problems it’s best to avoid taking snapshots of running Exchange 2010 mailbox servers. For sure, you can take snapshots of inactive servers (for example, shut the computer down after installing a new service pack and then take a snapshot) but even so, don’t assume that these snapshots can be used to bring a reconstituted server back into production without encountering some glitches along the way.
Problems after reverting to a snapshot is not the only thing to be aware of with Exchange 2010 mailbox servers. You shouldn’t use features like Vmotion to move DAG members to other hosts as this can also cause the DAG to have a severe headache. Microsoft’s perspective appears to be that customers should use the high availability features built into Exchange 2010 and not attempt to change the underlying platform when the DAG will not be aware of the change. This post provides a good overview of the issues involved with Vmotion.
My preference is to use physical computers for mailbox servers. I’ll cheerfully virtualize the rest, including such esoteric components like load balancers, but given the choice, I’ll always go with the comfort factor that a well-specified mailbox server delivers. This is largely a matter of personal choice allied to a suspicion that problems are easier to sort out when things go wrong on a physical box.
Everyone is rightly interested in virtualization because of its potential to increase the utilization of hardware. But the fine print has a nasty habit of catching people who let their enthusiasm run ahead of the capabilities of technology. All the more reason to conduct realistic operational tests of any new server product before bringing it into production so that you know how to deal with different kinds of server outages on both physical and virtual platforms.
For more information about Exchange 2010 and the many cool features included in this release, see Microsoft Exchange Server 2010 Inside Out, also available at Amazon.co.uk. The book is also available in a Kindle edition.