What caused the crippling of Exchange 2013 modern public folders?


Now that the initial fuss about the limitations that recently emerged for Exchange 2013’s modern public folders has subsided (but just a little), cooler minds turn to thinking about why these limitations exist. After all, there doesn’t seem to be any rhyme or reason why the limitations should kick in at 100 public folder mailboxes and 10,000 public folders in the hierarchy. These are, after all, small numbers in the overall scheme of Exchange and especially so when compared to the massive public folder deployments with previous versions of Exchange.

According to some sources, the largest known public folder deployment spans some 36TB of data in 1.2 million folders with a 6GB hierarchy. Obviously it took some time for the company that owns this data to accumulate so many public folders, but that’s life. They had good business reasons for using public folders in this way. What’s important now is that the current restrictions placed by Microsoft on the deployments of modern public folder pale when compared to those kind of volumes.

Given what we know about Exchange 2013, where could the problems lie? Well, apart from belatedly publishing the limitations, Microsoft has not been too forthcoming with information on this point, perhaps because they are pulling together development plans to address the issue and provide solid guidance to customers. In the interim, no one loves a vacuum so let’s fill it with some speculation.

Modern public folders are stored in mailbox databases and not the traditional public folder databases. There is a heap of goodness in this transition, not least the fact that public folders can now be protected like any other mailbox by being in a replicated database within a Database Availability Group (DAG). In fact, an EHLO blog post about modern public folders in May 2013 explains “how public folders were given a lot of attention to bring their architecture up-to-date, and as a result of this work they would take advantage of the other excellent engineering work put into Exchange mailbox databases over the years.” Indeed! Despite the excellence of the engineering, those pesky limitations exist.

Joking apart, the Exchange development team has greatly enhanced the ability of the Information Store to deal with massive amounts of data over the last decade. We know that Exchange 2013 can handle user mailboxes that are larger than 100GB comfortably and that mailbox databases can exceed 2TB without breaking sweat. In light of this, I think it unlikely that the problem is with the Information Store. After all, the 36TB monster deployment could fit in 800 public folder mailboxes, each of which stored 45GB of data plus a copy of the 6GB hierarchy, and spread across 100 mailbox databases, so each database would have 8 public folder mailboxes to keep client connections to a reasonable number.

Each mailbox would have to store 1,500 public folders but that doesn’t seem like a huge problem. The bigger issue would be figuring out how to distribute the 1.2 million public folders across the 800 mailboxes to balance the load. As I have described before, this aspect of public folder migration is poorly served by automation and is an intensely manual, boring, and error-prone process.

I realize that 800 mailboxes is much more than the 100 public folder mailbox limitation that now exists and that a 51GB public folder mailbox is larger than the size recommended by Microsoft. However, 800 large public folder mailboxes distributed across 100 databases is not beyond the power of Exchange 2013 to handle on suitably-configured hardware, especially if the databases are spread across multiple DAG members.

Of course, Microsoft recommends that each public folder mailbox serves no more than 2,000 concurrent users, so my back-of-the-envelope calculation above might run into difficulty if the 800 public folder mailboxes had to handle more than 160,000 concurrently connected users. Even so, I cannot see why the number of mailboxes matters at all, unless it’s to do with the overhead involved in keeping more than a certain number of public folder mailboxes synchronized.

Suspicion then moves to the public folder hierarchy and the way that the new architecture allows for just one writable copy of the hierarchy per organization. This copy is stored in the first public folder mailbox that is created. Every public folder mailbox has its own read-only copy of the hierarchy and updates to folders (such as changing permissions) are referred back to the primary writeable hierarchy, which then updates the secondary hierarchies.

It’s possible that Microsoft has discovered that the current implementation works brilliantly for small deployments (say, 5 public folder mailboxes, each of which holds 200 folders) but the overhead of keeping the hierarchy synchronized across the organization becomes unmanageable at a certain point. Because the amount of updates increase as public folder mailboxes grows, it’s probable that the point where the manageability of updates becomes an issue occurs much sooner than would happen in the 1.2 million folder deployment contemplated above. I am not saying that this is the case or that the architecture is flawed, but it is easy to imagine how problems might arise inside a very busy hierarchy where many folders are updated during business hours.

I guess that it’s also possible that very large public folder deployments might see a clash between the need to serve interactive clients and the behind-the-scenes maintenance required to keep the hierarchy updated.

Outside the Information Store and the modern public folder architecture, it’s possible that the Mailbox Replication Service (MRS) might run into difficulties when it attempts to move large amounts of public folder data during migrations.

When you migrate data from old to new public folders, MRS processes the move in much the same way as it moves mailboxes. MRS connects to a public folder database on a suitable server, enumerates the hierarchy and the data that it finds in the folders, and then copies that information to a set of modern public folders that have been set up to receive the inbound data. Depending on how much data has to be copied, server load, and other conditions, the copy operation might take several days. When the initial copy is done, MRS auto-suspends the migration to allow administrators to validate that folders and data have been moved to the right place. Once everyone’s happy, MRS is allowed to resume the migration to completion, which it does by performing an incremental synchronization to ensure that any changes made since the migration started are picked up. As shown below, the processing done by MRS to move public folders can be compared to the way that it moves mailboxes.

MRS ProcessingPhase Mailbox Move Public Folders
Initialization Connect to source and target mailboxes Connect to legacy public folder database and new public folder mailboxes
Enumeration Mailbox folder structure and number of items in each folder Public folder hierarchy and number of items in each folder
Move Transfer items in enumerated folders from source to target mailbox Transfer items in enumerated folders from source public folder database to target public folder mailboxes
Auto-suspend If required after all enumerated data is transferred Always after all enumerated data is transferred
Completion Immediate if not auto-suspended; otherwise incremental synchronization to copy delta changes from source to target mailbox After flag is set to allow completion, an incremental synchronization copies delta changes from source public folder database to target public folder mailboxes

Generally MRS handles public folder migrations very well. But it’s easy to imagine that MRS might not relish the thought of handling one of the old-style mega public folder deployments, if only because of the sheer amount of items to process. So perhaps the problem is due to MRS running out of steam when faced by a very large migration (one that involves more than 100 target public folder mailboxes and 10,000 source folders). MRS was originally built to move mailboxes rather than public folders and few mailboxes will be as large as a public folder deployment can be, so it’s possible that this is where the root cause lies.

No doubt we shall know more when Microsoft is ready to share their words of wisdom. Hopefully they’ll be able to explain why the issue occurred, why they were not able to test at scale to detect the problem, and why the solution being put in place really will work. It is unfair for customers who have migrated to modern public folders to be told that their infrastructure is now unsupported, as reported in a comment to my earlier post relating news of a hierarchy of 17,000 public folders. Something has to be done and fast.

Perhaps Microsoft will share more information about potential resolutions at the “Modern Public Folders and Migration to Office 365” session at the Microsoft Exchange Conference next week. I’m sure that MVP Sigi Jagott and public folders Program Manager Kanika Ramiji will keep everything cool, calm, and collected and offer some useful guidance to those who are planning a migration. Unfortunately I won’t be able to attend that session as the 4:45pm timeslot on Monday finds me chairing the splendidly named “Experts Unplugged: Exchange Top Issues – what are they and does anyone care or listen” panel session. I guess that some questions about the limitations of modern public folders might surface there too!

Follow Tony @12Knocksinna

Advertisements

About Tony Redmond ("Thoughts of an Idle Mind")

Exchange MVP, author, and rugby referee
This entry was posted in Exchange 2013 and tagged , , , , , , . Bookmark the permalink.

9 Responses to What caused the crippling of Exchange 2013 modern public folders?

  1. NeillT says:

    Very interesting article Tony. Especially with regards to MRS. I have found with mailbox moves that the more folders a user has, even with a very low item count and overall mailbox size, the longer the move takes. I had two very very similar users where one had >10k folders with very few items in each, her move took about 5 hours, the other only 2. MRS seemed to spend most of its time enumerating the folders to check for changes going by the logs.

    • It’s pretty natural that MRS should be so careful in enumerating the folders as they provide the basic structure for any move operation (mailbox or public folders). It’s ironic though that newer users (who tend to pay less attention to folder structures and depend on search to find items) are harder to move than older users (whose long-standing practice has been to set up folder structures and use that for filing).

  2. Scott Schering says:

    As it was explained to me today the issue is how long it takes the Primary mailbox to expand and verify the hierarchy when making changes.

    Say a user connected to a secondary hierarchy mailbox tried to create a new folder.
    The secondary mailbox sends that request to the primary mailbox where the change will be written.

    With a large public folder hierarchy the primary mailbox spends a quite a bit of time pondering its navel and expanding the hierarchy. The secondary mailbox is the impatient sort and stomps off back to the user screaming “this can’t be done!”after only waiting a few seconds.

    Funny thing is it does a great job of the basics like messages.. It’s just folders and calendars it gets all tied up in knots about.

  3. Casey says:

    I have spoke with Microsoft on this deal and they recommend to me to deploy Exchange 2010 Public Folder servers to house my PF data alongside my Exchange 2013 DAGs…

  4. Pingback: NeWay Technologies – Weekly Newsletter #88 – March 27, 2014 | NeWay

  5. Pingback: NeWay Technologies – Weekly Newsletter #88 – March 28, 2014 | NeWay

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s