Filers verus Pilers


Many moons ago, good filing habits were deemed to be an important part of office life. Documents had to be correctly placed in the right folder and carefully deposited in a file cabinet to allow the office to operate properly. Everything was neatly arranged, easily found, and everyone was happy.

Email began to become more common in offices from the mid 1980s onwards. System designers anticipated that users would employ the tried and trusted methods for filing paper documents with messages and created email servers and clients that allowed users to set up hierarchies of folders. Various extensions and additions to email servers such as ALL-IN-1 shared drawers (1992-1993) allowed documents and messages to be held in shared repositories. Much the same approach to folders and filing was taken in PC LAN-based email systems and subsequently appeared in Exchange 4.0 (1996). Filing continued unabated.

And then Google introduced Gmail. All of a sudden, users had massive quotas to store messages and there was no reason to delete messages any longer. But the real impulse to transform users from filers to “pilers” was provided by Gmail’s user interface. Folders were largely eliminated in favour of views; messages could exist in many different views which could be used to identify particular sets of messages extracted from massive piles held in the mailbox. Basic views such as Inbox, Sent Items, and Trash made Gmail look and behave somewhat like a traditional email client but it never delivered the same experience as the more structured and traditional approach embodied in clients such as Outlook.

Piling isn’t all bad. It’s certainly a less disciplined approach to mailbox organization than when messages are carefully moved into folders or deleted as they are read but that doesn’t matter too much if the client provides intelligent access to mailbox contents, including great search capabilities. And, as its protagonists point out, piling email instead of filing email saves a great deal of time.

Unsurprisingly given Google’s heritage, Gmail certainly succeeds with search, but only its inventors could have loved its original user interface. Gmail’s basic window on information organized messages into conversations, grouping messages from the same topic together and presenting them as a unified whole. New users often liked conversations but conversations were a bit of a culture shock for hard-core filers. Google eventually acknowledged that some people don’t like viewing email through conversations when they introduced the ability to turn off conversations and use views that order messages according to the time that they arrived into the mailbox. Google has tweaked Gmail in many other ways but its user interface is still not as elegant as other web-based email clients. I sometimes wonder whether the impending arrival of Office 365 and its much more elegant Office Web App interface will put any commercial pressure on Google to either improve the web interface for Gmail or its somewhat haphazard support for Outlook.

Although one can quibble about its appearance, there’s no doubt that Gmail has exerted a huge influence over the development of email. Consumers enjoyed massive mailboxes for years before most corporate users could even think about moving away from restricted 500MB limits; conversation views are now supported by many other email clients and servers, including Exchange 2010; but most of all, Gmail made it acceptable to discard the filing habit and become a piler, someone who simply left messages where they arrived or exited the mailbox and never bothered to move or copy messages into other folders. The three-folder mailbox (Inbox, Sent Items, Trash) became the de facto method of operation for new users as they took habits learned from consumer email systems into the workplace.

In terms of Exchange, a number of recent technical advances have contributed to easier piling. First, email servers support higher item counts in folders before performance degrade. All email products have a number of very important folders that are the focus for the majority of user operations. Exchange refers to them as “critical folders” and include folders such as the Inbox, Sent Items, and Deleted Items. The guidance offered by Microsoft has changed dramatically over the past three versions (See this EHLO post for a PowerShell script to locate folders with high item counts):

  • Exchange 2003: 5,000 items
  • Exchange 2007: 20,000 items
  • Exchange 2010: 100,000 items

The thought of one hundred thousand messages in the Inbox is a horrific prospect for a filer and a triumph for a piler!

The major reason driving the major increase in the number of items that Exchange supports in a critical folder has been changes made to the Exchange database schema. Microsoft tweaked it in Exchange 2007 and have conducted the major overhaul since 1996 in Exchange 2010. It’s actually surprising that it took Microsoft so long to improve matters as the way people work with email has changed so dramatically over the 15 years that Exchange has been available. When we deployed Exchange 4.0 in 1996, servers typically supported less than 200 users, mailboxes had quotas of between 50MB and 100MB, and a heavy day saw the arrival of 15-20 messages. And of course, Microsoft hadn’t yet got the Internet so Exchange operated in a world where SMTP was just one of the protocols to which Exchange could connect. X.400 was far more important, if only because the Message Transfer Agent (MTA) was built on X.400.

The second major factor is better search facilities. Outlook clients operating in cached Exchange mode can use Windows Desktop Search (WDS) to search items held in the offline store (OST) and of course, Exchange has its own content indexing facility to make sure that clients that don’t perform local searches (such as Outlook Web App) can find information easily. Google always emphasized easy searching as a big feature of Gmail and it’s fair to say that all major email servers available today see search as a fundamental part of their feature set. Indeed, Exchange 2010 indexes content as mailboxes are moved from database to database to ensure that searches are possible as soon as the attributes are flipped in Active Directory to point the client to the new mailbox location.

The third influence is a combination of faster laptop disks and more intelligent clients. Laptop disks have become larger over the years but never really improved in speed. This wasn’t a huge issue when mailboxes were small as the resulting OSTs were also reasonably sized. The OST file structure isn’t particularly efficient and performance traditionally suffered as soon as file sizes went over 1.5GB, or roughly the size of an OST equivalent to a 1GB mailbox (the overhead of the OST format can be up to 50% of the online mailbox) . Indeed, the maximum file size of an older ANSI-format OST file is 2GB whereas newer Unicode-format OST files can be a maximum of 50GB (see TechNet for more recommendations about deploying OST files with Exchange). If you feel that you need to configure larger PST or OST files for use with Outlook 2010, you can follow the instructions contained in this KB article.

Given the tendency towards large mailboxes, it’s obvious that some improvements needed to be made else we would all have been soon struggling with performance after the initial delight of receiving a 5GB or 10GB mailbox. With these OST sizes even a 7200 rpm laptop disk will experience some delays, so there’s no substitute for a solid state disk (SSD) if you’re interested in big OST files. The good news is that the price of SSD drives has been coming down so they are now reasonably cost-efficient. In fact, given that the real price of laptops has decreased dramatically while performance has increased over the last decade, it makes sense to spend a little extra money on a fast drive to be able to exploit the true potential of the computer.

Outlook 2010 is better able to deal with large OSTs too. I don’t have any evidence that would stand up in court to prove this point, but it is my experience that the 64-bit version of Outlook 2010 running on Windows 7 provides much smoother performance with large OSTs than its Outlook 2007 or Outlook 2003 (both 32-bit) predecessors.

In summary, it doesn’t really matter now whether you are a piler or a filer as today’s technology will hide the flaws of either approach. Instead of expecting users to behave in a certain manner, attention is now focused on how to manage the information that users accumulate in an intelligent manner. That’s why the ability for administrators to set policies that control how content is managed automatically by servers is important; the really interesting thing for me is to see how more automation can be incorporated into software over the next few years to help people organize, locate, and utilize their information better. Auto-tagging seemed like a great example of what I’m looking forward to but unfortunately its implementation was flawed in Exchange 2010 RTM and Microsoft withdrew the code in Exchange 2010 SP1. Hopefully, auto-tagging and other software will appear in future versions to help us all keep our ever-swelling mailboxes under some sort of control!

– Tony

For more information about the changes made to the database schema in Exchange 2010, see chapter 7 of Microsoft Exchange Server 2010 Inside Out, also available at Amazon.co.uk. The book is also available in a Kindle edition.

Advertisements

About Tony Redmond ("Thoughts of an Idle Mind")

Exchange MVP, author, and rugby referee
This entry was posted in Email, Exchange, Exchange 2010, Office 365 and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s