Do PSTs contain anything of value?

In musing about the news of the PST Capture tool that Microsoft plans to release soon, I started to consider the question of whether tools like this can actually find and recover any useful information. The worry of executives and lawyers is that PSTs contain all manner of corporate secrets and other information that they should control. They fondly imagine that if they could only recover this information from users, they would have much better sight over corporate assets such as intellectual property, contracts, product information, and so on. The reality is sadly different.

Traditionally, PSTs have acted as pressure valves to relieve the pressure of restricted mailbox quotas. As users approach or exceed their mailbox quotas, they move items from their online mailbox into a PST. Life can continue, mail can flow, and the user doesn’t have to care too much about their PST until the next time that a quota emergency approaches or they need to find something that they moved. Most users are “pilers” and aren’t very good at filing. At least, they might make an effort early on but soon the additional steps necessary to file messages into appropriate folders becomes too much trouble. The upshot is that their mailbox tends to be organized into the default folders (Inbox, Sent Items, etc.) and a small number of other favorites such as “Personal”.  PSTs are probably not going to be much better organized as they often mimic the filing structure used for the primary mailbox. Fortunately, the advent of good search mechanisms and improvements in both Exchange and Outlook has meant that the need for folders has largely disappeared and it’s OK to have mega-folders containing 50,000 items or more, assuming of course that you’re running Exchange 2010 and have been assigned a large quota.

Given that PSTs have acted as a pressure valve for mailboxes, is it likely that they really hold much valuable corporate data that can be harvested and secured with a tool such as the PST Capture proposed by Microsoft or similar software sold by Transvault, Symantec, and Sherpa Software? The answer is “it depends”.

The PSTs used by executives might well be interesting – but in most companies executives don’t have to worry about mailbox quotas and therefore don’t need to resort to PSTs. All of their valuable information is likely to be online in a well-organized mailbox run by their administrative assistant.

PSTs used by “knowledge workers”, people who generate intellectual property such as engineers, developers, and other professionals, might be a happier hunting ground in terms of a search for valuable corporate information. There’s a fair chance that some items of interest will be lurking in PSTs such as drafts of patent applications or invention disclosures, descriptions of deals being developed with customers, proposals for corporate alliances, budgets and financial reports, and marketing plans are examples of the kind of data that a PST capture tool might usefully recover and import into an online mailbox, archive mailbox, or other repository where the information can be indexed and exposed to corporate control.

Alas, the vast bulk of the PSTs in use probably don’t contain very much of interest and the danger therefore exists that any effort to discover and recover information from PSTs held on user PCs will end up in a massive import of absolute rubbish into Exchange or another repository. And once that rubbish gets online, it will act like cholesterol clogging up the arteries of Exchange.

If you doubt my premise you might consider examining the contents of some user PSTs, assuming that users will allow you to go through their folders. I’ll bet that you’ll find items such as:

  • Old calendar appointments and responses lovingly preserved to record the fact that the user was actually invited to attend such a gathering.
  • Delivery receipts and non-delivery notifications – no, I don’t know why people keep these things, but they do.
  • Examples of junk mail kept just in case the user feels the urge to purchase drugs, sell gold, or send money via Western Union to a correspondent in Nigeria.
  • Banal and inane interactions between users of the type that should be immediately deleted upon receipt. Chain letters are in this category as is anything to do with fluffy cats or lost dogs.
  • MP3 and MPEG files downloaded from doubtful web sites.
  • Some useful information that needs to be retained (small percentage).

Don’t get me wrong. I am sure that there is much useful data stored in PSTs. For example, Stephen Griffin’s blog explains the problem posed by one law firm who used a separate PST for every case. Apparently PSTs became the method used to transfer the information relating to cases between lawyers and users commonly opened several hundred PSTs at one time. Stephen, who’s probably Microsoft’s primary MAPI developer, goes on to explain how they overcame the challenges and that Outlook can open up to 300 PSTs but really runs out of steam after 100. The term he used was “noticeable performance issues”! I’m not actually sure how this company could move away from PSTs because they have clearly built a form of workflow around transferring files between individuals nor am I clear about how they’d use online mailboxes for the same purpose. However, getting back to the point in hand, this company is the exception that proves the rule and in general the problem with the data held in PSTs is how to separate the wheat from the chaff.

I see two dangers ahead. The first is that some administrators will be so excited by the availability of the PST Capture tool that they’ll go ahead and import every PST they can find and thus cause Exchange to collapse under the strain of the CPU and I/O processing required to introduce a vast amount of junk into the Store, not to mention the increased disk and backup requirements. The second is that administrators will believe the PR that large mailbox quotas are good because Exchange 2010 does such a fantastic job of supporting mega-mailboxes so they’ll go and enable large quotas for all, leading eventually to the storage of even more of the crap described above – this time online.

The net learning from all of this is that PST find and import tools are good, but only when used properly to find and recover information that is valuable to the company. I hope that people remember this when Microsoft eventually releases their new tool to the community.

  1. Rich says:

    On the money again! Probably in a couple of years there will be some cleverly devised software to interrogate and cleanse internal mailboxes or PSTs and remove useless junk. Arguably this is simply an extension of anti-SPAM for internal use. Or Enterprise companys will crackdown on what content users are permitted to store (fat chance of this happening).

    Im glad you highlighted what I see pretty much every day. Users hoarding and piling useless emails forever and ever!

    Perhaps the move to Lync and other IM products (non-recorded conversations) are the way forward for all everyday conversations. Email can then be used for more formal communications that need a paper trail.

  2. Kristen says:

    “Probably in a couple of years there will be some cleverly devised software to interrogate and cleanse internal mailboxes or PSTs and remove useless junk.”

    -In a couple of years? I think there’s software out that does that now…

  3. Neill T says:

    Our main driver for PST ingestion was compliance. We brought a sister company into our Exchange environment who had been using PST’s quite extensively and had no journaling.
    We decided that to pre-empt any legal problems down the road that we wanted to capture all the PST information into our archive. Not so much that we thought that there was actually anything valuable in them but any lawsuits would require us to affirm that we had done due diligence in searching.

