Musing on searching


The publication of the post by the Exchange team to reveal the secret registry instruction to allow multi-mailbox searches to interrogate more than 25,000 mailboxes got me thinking. First,  I thought that the era of registry hacks was over for Exchange. But on reflection I don’t think that we are on our way back to the bad old days of Exchange 2000 and Exchange 2003 when Microsoft published copious registry hacks to influence the way that the software operated and figuring out just what had been changed on a server became a real problem for support professionals.

Of course, these weren’t the first versions of the product to use secret registry settings and the standard was set by the famous “Squeaky Lobster” hack that you had to input to reveal advanced performance counters on an Exchange 5.5 server. Exchange 2000 introduced a huge variety of new features. The administration interface lagged somewhat and, unlike today, the developers were not allowed to introduce new UI in a service pack. So they enabled features and tweaks through registry hacks. The disease rapidly spread throughout Exchange to a point where I doubt that even the most devoted Exchange nerd could keep up.

The regime of a new Vice President of the Exchange group changed things and we don’t have so many registry settings to tweak today. Of course, you can argue that registry settings have been replaced by obtuse XML-formatted configuration files such as those used by the Mailbox Replication Service (MRS) or the transport service.This is true, and XML configuration files suffer from the same fatal flaw as registry settings in terms of being server-specific and not friendly to the needs of a distributed environment. They also suffer from the problem of language and debugging in that it is all too easy to make a mistake when you edit one of Exchange’s configuration file. The product doesn’t include an intelligent editor for these files, possibly because it’s the developers’ way of saying “hands off – don’t edit this”, so most administrators resort to Notepad and make changes on the “suck it and see” principle. Sounds very much like editing the registry…

In any case, returning back to multi-mailbox discovery searches, it’s a nice thing to know that administrators in large organizations can bring servers to their knees by launching searches that span 100,000 mailboxes and gather tens of gigabytes of data, possibly even dragging all that data across the network to the default discovery mailbox that’s still located on the first Exchange 2010 mailbox server installed into the organization. Clearly not a good thing to do and indicative of the need for planning before the deployment and use of multi-mailbox searches.

What other issues might affect these searches? Here are a number of tips that you might like to bear in mind.

  • The UI for discovery searches is not revealed by the Exchange Control Panel (ECP) unless your account holds the Discovery Management RBAC role. Obvious, but often overlooked… There’s no way to execute searches from the Exchange Management Console (EMC), so this is one of the items of functionality that is unique to ECP. If you don’t like using ECP, you can create mailbox searches using the New-MailboxSearch cmdlet to create new searches, Get-MailboxSearch to return details of searches, Set-MailboxSearch to update search criteria, Start-MailboxSearch to start a search, and Remove-MailboxSearch to remove the search criteria from the arbitration mailbox (see below).
  • Mailbox searches depend on the content indexes that Exchange populates as items arrive into mailbox databases. Even though Exchange 2007 supports content indexes, you can only search data hosted on Exchange 2010 mailbox servers. This means that you have to complete your migration from Exchange 2003 or Exchange 2007 before discovery searches are really feasible. Of course, you can short-circuit the process by moving the mailboxes that are involved in a discovery action to Exchange 2010 servers.
  • Discovery searches can find items in the Recoverable Items folder (aka the “dumpster”) or those on retention or legal hold because these items are held in folders that are invisible to users but are indexed.
  • Exchange can search message properties (for example, subject, addressee list) very effectively because these data are available in the mailbox databases. Attachments have to be made discoverable to Exchange before their content can be incorporated into the indexes. Microsoft makes the Office 2010 filter pack available to allow you to install the IFilters necessary to index Word, Excel, PowerPoint, Visio, and so on and the pack must be installed on all mailbox servers (for content indexing) and transport servers (to allow transport rules to examine content in en-route messages). These filters cover the vast bulk of documents circulating in corporate environments with the glaring exception of PDF. Adobe has an IFilter available for PDF but some have reported better results with the version available from Foxit Software. You know you have problems with IFilters when searches report a high number of unsearchable items (the properties of these items will be searched – the item is unsearchable if its content is inaccessible). Of course, in this context, a high number is linked to the total number of items searched. If you search 10,000 mailboxes it’s probably acceptable to have 250 unsearchable items (but still a good idea to understand what these items are) while 2,500 unsearchable items might be problematic.
  • Determining the effectiveness of your search parameters is not easy. Exchange will report the mailboxes that it scanned and the number of hits that it generated but it’s hard to understand whether you have found the desired information until you look through the captured items. Clearly you need to experiment with search criteria (Exchange uses the AQS syntax for searches so you can construct very complex and precise searches) to hone in on the right material and it may take several attempts until you know you have the right search. Exchange allows you to test search criteria without capturing any data and that’s absolutely the way to proceed until you know you’re looking in the right place. After that, you can decide to capture either deduplicated or all data. A deduplicated search captures the first instance of an item no matter how many mailboxes in which it is found. An “all-in” search captures each and every instance of an item. Obviously, it’s the nature of email that many items occur in multiple mailboxes so a deduplicated search (introduced in Exchange 2010 SP1) captures far less data.
  • As mentioned above, the first Exchange 2010 mailbox server installed into the organization hosts the default discovery mailbox. The mailbox is disabled but visible through the admin tools with the name “Discovery Search Mailbox”. This mailbox is used to store the copies of items recovered by searches so it has a large 50GB quota. It can be moved to another server if appropriate or you can create additional search mailboxes for use with specific investigations. To create a new discovery mailbox, use a command like this:

New-Mailbox -Name 'Discovery Mailbox for ABC Investigation' -Discovery -UserPrincipalName 'ABCDiscoveryMailbox@contoso.com' -Database 'MB2'

Note that I’m careful to assign the new discovery mailbox in a specific mailbox database. Ideally, this database should be close (in network terms) to the databases that contain the mailboxes that will be searched to minimize the amount of network traffic generated when discovered items are captured and stored in the discovery mailbox. Remember that if the discovery mailbox is in a database that has copies, Exchange will need to replicate the search results to all servers that host database copies, so a big search can have a very real impact on many aspects of system performance.

New discovery mailboxes are immediately available as a target for search results but they are not automatically accessible to the members of the Discovery Management role group. This is by design as the intention is to allow for the separation between the work done by the people who create and execute searches and those who review the gathered results. You have to specifically change the permissions on the newly-added discovery mailbox to make it available to those who have the authority to review the material captured there. Discovery searches can turn up huge masses of confidential business and personal data so it’s obviously critical to keep close control over the users who can access discovery mailboxes. It’s also a good idea to agree guidelines with your legal advisors as to how long the results of discovery searches should be kept as obviously you don’t want confidential material being kept for longer than it should be.

Exchange 2010 stores the metadata (the parameters used to describe the search) for searches in a hidden system mailbox called “SystemMailbox{e0d1c29-89c3-4034-b678-e6c29d823ed9). Thankfully, you won’t have to type that name too often. You can see this mailbox listed with this command:

Get-Mailbox -Arbitration

Overall, I like the structure that Microsoft has established in Exchange 2010 for multi-mailbox searches. I don’t like the tools available to analyze the effectiveness of searches or to review the results that are captured in the discovery mailboxes. Hopefully Microsoft will improve matters in future releases.

– Tony

For more information about multi-mailbox discovery searches, read chapter 15 of Microsoft Exchange Server 2010 Inside Out (pages 1033-1049), also available at Amazon.co.uk. The book is also available in a Kindle edition

Advertisements

About Tony Redmond ("Thoughts of an Idle Mind")

Exchange MVP, author, and rugby referee
This entry was posted in Exchange 2010 and tagged , , , , , , , , , . Bookmark the permalink.

5 Responses to Musing on searching

  1. M Koehn says:

    I am running discovery searches across my all mailboxes in my domain. When results get returned, the search summary lists the total (and names) of all mailboxes in the “Mailboxes to Search”, however the “Mailboxes Search Successfully” shows a lower number (about 1650 of 1750 mailboxes) and the “Mailboxes not Searched Successfully” shows (0) None.

    I’ve checked all databases to make sure their Indexes were updated and current. Any thoughts to this discrepancy? I may need to explain this to our legal department.

    We are running Exchange 2010 SP1 (no rollups).

    • Is this an “estimate” search or a “results” search. If results, maybe you can check the discovery search mailbox to determine whether every mailbox was in fact searched. You might also be able to identify any mailboxes that have been missed. It will be tedious to check one list against another but it’s really the only way to determine whether any mailboxes are causing a problem for search,

      TR

  2. Tony:

    This is slightly off-topic but I didn’t know where else on your blog to post this question. We have moved to Exchange 2010 and we are still in the early phases of moving to online archive mailboxes. We are stuck with Outlook 2007 indefinitely, so OWA is our friend when it comes to certain things such as applying personal retention/archive tags. Anyway, a couple of us have noticed that searches of the online archive, initiated from Outlook, are turning up items but when we try to open them, the content isn’t found. This same query succeeds in OWA however. When I checked indexing options on my computer control panel, my mailbox and all legacy PST’s were checked but the Online Archive was grayed out and there was no check box to enable it. Hovering the mouse over the Online Archive yielded the following tool tip: “Indexing of this Email Store has been disabled by the Administrator.” Sounds like I have to adjust a Group Policy setting somewhere, but I’m uncertain where that would be or if that is a good idea to enable.

    • I bet you are searching Outlook when it’s configured in cached Exchange mode. When this happens, you’re actually using Windows Desktop Search to index and search items in your OST. This explains the different results that you see when you search using OWA as that client depends on the content indexes that Exchange builds as it adds items to a database. Content indexing depends on the Office Filter Pack and other iFilters that are installed on a server (such as PDF) to be able to index attachments. You might be seeing a result returned by Windows Desktop Search that it’s been able to index the metadata but not the content – but Exchange has been able to index both metadata and content.

      As to indexing on the PC, remember that the contents of archive mailboxes are not kept in either a PST or OST so they are not available to Windows Desktop Search. You have to access archive items online, in which case they are indexed by Exchange.

      TR

  3. Tony, thanks for the quick response. You are correct that we are in cached mode. I realize the online archive is not locally cached (that is a very good thing). Since Outlook 2007 depends upon Desktop Search, then it sounds like I must utilize OWA to reliably search the online archive.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s