A collective sigh settled upon the worldwide Exchange community after Microsoft’s much-anticipated but late release of their free-to-use PST Capture tool occurred today. The tool was originally announced in July 2011 for delivery by the end of the year. Essential testing, especially through a restricted external test by some members of Microsoft’s Technology Adoption Program (TAP), slowed the release. However, the tool is now available for download from Microsoft’s web site (updated for PST Capture 2.0 Released 20 February 2013) and you can make your own mind up whether the software fits the bill in your circumstances. Some documentation is available. However, it’s pretty sparse at present and needs to be filled out with experience and tips and techniques from actual deployments. Prepare for a flood of blog posts! In the meantime, I offer some thoughts on the challenges that exist in any PST ingestion project – including the indigestion that might result.
The EHLO posting provides some background on the tool, including an interview with Microsoft product manager Ankur Kothari, who also spoke on the topic at Fall Exchange Connections in Las Vegas last November. The most interesting thing I learned was that the PST Capture tool is based on software that Microsoft bought in from Red Gate Software. I’m not against the intelligent acquisition of technology as no one has sole ownership of innovation and it makes absolute sense to buy in software to address a problem if this accelerates the solution. Of course, Microsoft has done a lot of engineering to make the acquired software meet their own requirements and to work well with on-premises Exchange 2010 as well as with Office 365.
The tool doesn’t work with earlier versions of Exchange and you’ll need to deploy Exchange 2010 mailbox and Client Access Server (CAS) to be able to use this solution. To be precise, Microsoft says that they haven’t tested the tool against Exchange 2007 and it might or might not work with that version. I doubt that anything is possible with Exchange 2003 and believe that the PST Capture tool will have the good taste to ignore this now-antiquated version. Of course, once PST data is captured in an Exchange 2010 mailbox, you can move that mailbox back to an Exchange 2003 or Exchange 2007 server, but that seems like a pretty silly approach to take.
There are three major components of the PST Capture tool:
- The Central Service manages the set of PSTs discovered within an Exchange organization as well as the import of their contents into an Exchange mailbox (on-premises or cloud).
- The Capture Console provides the management interface to set up import operations and associate discovered PSTs with user mailboxes so that Exchange can direct data to the right location. The console allows administrators to schedule and track the progress of import operations and retry operations if any fail, perhaps due to transient network conditions.
- A set of Capture Agents deployed to user PCs are used to find PSTs, including those stored on removable devices such as USB drives.
In overview, you create a PST search from the Capture Console that causes PST Capture Agents to scan for PSTs on the PCs where they are installed. Details of the discovered PSTs are returned and registered with the Central Service. After PSTs are discovered, an administrator can then decide what should be imported to Exchange and link the PSTs with target mailboxes. They then create an import job to do the work. When PSTs are imported, they are first copied from the client computer to a staging area on the computer where the Capture Console runs. The staging area is sized at 20GB by default. This figure can be increased to accommodate larger import operations. Once PSTs are copied to the staging area, the data is imported to Exchange under the control of the Central Service.
There’s no magic here and PSTs will not disappear overnight. In fact, Microsoft’s tool won’t work unless you do a fair amount of up-front preparation to deploy agents to client PCs. Not all companies exert the necessary control over client PCs so this is an obstacle that must be overcome before any PST can be captured. You are able to add the file names of PSTs directly to an import list. This is a workaround that avoids the need to install Capture Agents on client PCs. However, the workaround is only viable if you can map the drive where the PSTs are located. For example, you could import a set of PSTs stored on a network drive. Obviously, this approach won’t work for PSTs stored on laptop PCs that are invisible to the network.
The next issue is bandwidth. To be captured, PSTs have to be copied from client PCs to the central staging area. This is easy enough for client PCs that run on corporate networks. It might not be so easy for PCs that enjoy intermittent access to the corporate network, such as those used by road warriors. Indeed, over the years, Microsoft has removed much of the need for roaming PCs to connect back to the mother ship via VPNs to access email, but in this case you’ll need to be able to access the PCs that hold the target PSTs on the network. A reasonable connection is also necessary to allow the capture agents to transfer the PSTs back to the computer where the Central Service runs. In this context, “reasonable” really means close to LAN-quality.
PSTs can range in size up to 32GB. In reality, the vast bulk of PSTs will be in the 200MB to 2GB range, but that’s still a fair amount of data to transfer, especially if multiple PSTs are being imported. If possible, PST capture operations should be scheduled outside normal working hours to take advantage of lower network demand. Of course, this implies that the clients PCs are connected at these times, so it follows that a reasonable amount of planning and coordination is required.
Moving PST data to Office 365 mailboxes introduces another network consideration and it’s hard to give a realistic guideline as to how much PST data you’ll be able to transfer to Office 365 per hour. The best idea is to run some tests in your own environment as this will give you some hard data that is pertinent to your circumstances rather than a finger-in-the-air guess based on theoretical conditions.
The reason why people use PSTs is to liberate themselves from the tyranny of harsh mailbox quotas. Of course, we live in an era when mailbox quotas are counted in gigabytes rather than megabytes but even so, you need to prepare for imports by adjusting mailbox quotas to allow Exchange to import the PST data. Note that part of the preparation process is to associate discovered PSTs with target mailboxes. A single import job can take data from multiple PSTs and move the data into a single destination mailbox. If you’ve deployed archive mailboxes, you can opt to import PST data into these rather than use the primary mailbox.
More esoterically, but potentially important, if you use extended Database Availability Groups (DAGs), you need to keep an eye on replication activity during PST import operations. Remember that importing data into a mailbox is like as if users suddenly created and sent email as if they were Duracell bunnies on steroids. Exchange will dutifully create transaction logs to capture details of all the new transactions and replicate the logs to other servers within the DAG. Depending on the network connectivity between servers and the other workload that’s going on during import operations, the extra transaction logs can build up in copy or replay queues and mean that database copies aren’t as quite up-to-date as you’d like them to be. The queues will clear eventually after the imports finish and server load levels reduce to normal.
The final challenge is to gain user buy-in. Beautiful as they are, users can break an administrator’s heart with their lack of cooperation with carefully-hatched plans. PSTs are “personal storage files” and the “personal” is the important word here. Users store all manner of information in these files, some of which is probably interesting corporate information that absolutely should be in a mailbox and subject to compliance regulations. But other information is going to be personal and users might like to keep it under their control and invisible from the prying eyes of the corporation.
PST capture is an all or nothing operation and everything ends up in the target mailbox. Education and awareness will be import to inform users why PST capture is a good idea, how it will happen, what it means to the user, and when capture will occur. I think it’s reasonable to advise users to create a special PST with an appropriate name (maybe “Personal – Not for Import”) where they can move any items that they don’t want to be imported into a mailbox. You can then exclude these PSTs from capture operations and proceed on the basis that any other PST that is discovered is fair game.
Technology is wonderful but it can only solve problems when the necessary groundwork is done to prepare for its deployment. It’s good that Microsoft has provided a free PST capture tool and I sincerely hope that it will provide the impetus to convince organizations that it’s now possible to move away from PST storage to more robust solutions. Other vendors such as Sherpa Software and Transvault will be happy to provide alternatives if you don’t like the Microsoft solution. Each product has its own set of strengths and weaknesses and you should consider how each fits into your operational environment before making a choice. Considerations such as the scanning mechanism used by each tool, how well the tools deal with large amounts of client computers and PSTs, recording and logging of import operations, and support should be taken into account when you make a choice.
No tool will gain you user buy-in. That’s going to be your personal challenge. Good luck making the case to your ever-receptive users… you’ll need it!
Follow Tony @12Knocksinna
Update: EHLO post on PST Capture 2.0 released on February 22, 2013