Scaling connections with Exchange 2010

It’s an undeniable fact of being an author of a book on a technical topic that you cannot cover everything in the number of pages that a publisher allocates. Sometimes this causes you to cut material that you think is perfectly good but is less interesting in the grand scheme of things. On other occasions the page limit provides a useful excuse for not covering something that should really have been in the book. Such is the case for the addition of Kerberos authentication for MAPI clients in Exchange 2010 SP1, which I missed out when I wrote Microsoft Exchange Server 2010 Inside Out.

The gap in our collective knowledge was plugged in TechNet. However, that nugget of information was probably overlooked in the mass of new data released around Exchange 2010 SP1 so it’s a good thing that the redoubtable Ross Smith IV has now blogged on the topic.

Enabling Kerberos authentication for MAPI clients is not for everyone. For one thing, it only works for domain-joined Outlook clients that connect inside the corporate firewall and does nothing to help with scaling for Outlook Anywhere connections, if that’s your interest. The need for an alternative to NTLM authentication arises from the fundamental change made in the Exchange 2010 architecture when the MAPI endpoint is moved from the mailbox server to the Client Access Server (CAS).

Lots of goodness results from this change, not least the huge transformation of a mailbox database into something that is truly portable between Exchange mailbox servers to provide the foundation for the Database Availability Group (DAG) and the whole high availability story that Microsoft now proudly proclaims. However, as Ross points out, the relocation of the MAPI endpoint has an effect on Outlook clients that previously connected to Exchange 2007 with Kerberos (a client-side setting) as the Exchange 2007 mailbox server that serviced the connection no longer acts as the MAPI endpoint. Instead, the connection is handled by a CAS server and the CAS server is highly unlikely to be called the same name as the previous mailbox server. In any case, the database that holds the mailbox that Outlook is interested in might now be managed within a DAG and who knows what mailbox server it is currently running on! Everything continues to work because Outlook will revert to NTLM authentication when its attempt to use Kerberos fails, but then you can run into some scalability issues. You are unlikely to see these issues in test environments and indeed, may not encounter them in production unless the connectivity load overtaxes the infrastructure.

An example is in order. It comes from an Exchange 2007 deployment but serves the purpose of illustration. Our project was somewhat ground-breaking as it involved using Outlook Anywhere to connect some 90,000 clients across the Internet to servers in an hosted datacenter – all of the traditional costs involved in outsourcing deals of running large network pipes between customer and hosting company were eliminated using this approach. It’s very similar to what happens when you connect to something like BPOS or Office 365 but we were a little ahead of the game in that many of the scalability limits that have subsequently been discovered and worked around were still unknown.

In any case, we had deployed an array of CAS servers behind another array of ISA servers to handle the incoming connections. All worked well until we approached a load broadly equivalent to 30,000 clients. At this point we went into meltdown, servers failed, and clients failed to connect. A war room involving the customer, Microsoft, and the hosting company swung into action and many attempts were made to resolve the issue. Additional ISA and CAS servers were installed, different protocols were isolated and routed to specific CAS servers, all manner of debugging techniques were used and crash dumps examined – all to no avail.

The problem persisted for nearly ten days until someone noticed that incoming authentication requests were not being handled smoothly by the domain controllers. The default number of secure channels assigned to handle NTLM authentication (as used by Outlook Anywhere connections) is 2 and this proved totally inadequate for our purposes. The number is controlled by the MaxConcurrentAPI value (this blog provides a good insight, but there are many other war stories that can be found using your favourite search engine).

Increasing the value of MaxConcurrentAPI (I could never understand why this setting bears such a name) in the system registry on the CAS servers to 4 cured our problem and allowed the war room to disband. The lesson that I took away from the event was that it is really hard to predict how high volumes of client connections will be handled by an infrastructure and that there are precious few tools to help test connection load. Changes subsequently made by Microsoft have helped. For example, the original Outlook 2007 client generated multiple unused and unwanted connections that stressed the infrastructure so Microsoft removed these connections in Outlook 2007 SP2. And now we have Exchange 2010 SP1 providing another way to avoid secure channel overload by using Kerberos authentication instead.

We live and learn through experience. I think most of the major connectivity challenges for large-scale Exchange deployments are now well understood and documented. Now that I’ve said that, I’m sure there will be some who inform me that I know nothing as there are some brand-new sparkling connectivity sink-holes for us to fall into…

– Tony


About Tony Redmond

Lead author for the Office 365 for IT Pros eBook and writer about all aspects of the Office 365 ecosystem.
This entry was posted in Exchange 2010 and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.