New technology and different business requirements are seminal events that cause technologists to ponder on well-held tenets of their trade. The need for Exchange backups is one such instance. The technical developments are the elimination of streamed backups in Exchange 2010 (only VSS-based backups are now supported) allied to the availability of features such as archive mailboxes, the march towards 50GB primary mailboxes, and a range of new compliance and discovery features that are built into the product. These new developments create a terrain sufficiently different to what has gone before to question whether traditional backup strategies are now appropriate for Exchange.
Backups serve many purposes. The first and most obvious is to provide a warm blanket feeling for an administrator who knows that their data is safe because it’s been copied to some media that can be taken off-site and is available to be restored if a hardware problem occurs. The second is as a source for point-in-time (PIT) recovery should the need arise. The classic example here is to recover some information that a user has subsequently deleted, perhaps in an effort to cover their tracks. The requests to recover data from backups usually come from legal sources (internal or external) in the form of discovery actions. Other benefits exist such as the ability to satisfy audit requirements by removing backup media to a remote location, but the two purposes outlined above are the most common.
The need for a company to respond to an email discovery action is far more common today but these requests are not new; their popularity simply reflects the growth of email as a method for business communication that has supplanted the letter, telex, and fax. The first criminal investigation that I was aware of where backups were required was a U.K. Serious Fraud Squad inquiry into some financial offences in 1989. In this case, the investigators required backup tapes to be restored on a VAX/VMS system so that the ALL-IN-1 accounts of the people that they were interested in could be reviewed. There was no notion of dumpsters or single item recovery. Items of interest were printed off and provided to lawyers, who eventually took the decision whether something was important.
I was also briefly involved in an inquiry in Australia around 1996 that looked at the circumstances surrounding the crash of a small commuter airplane some years before. In this case, the QC (Queen’s Counsel – lawyer) who led the inquiry wanted to know whether the operating airline had applied the necessary maintenance procedures to an airplane that had crashed. The request from the investigators was to review email sent by 30 users over a period of 3 months, later reduced by the QC to a list of ten users for four weeks. Once again, daily backups had to be restored from tape to a VAX/VMS system to search for interesting ALL-IN-1 messages in the target accounts and a vast amount of 132-column wide line printer paper was consumed to capture information for legal review.
While backups serve to make data available on a PIT basis, it’s an indisputable fact that taking and storing backups is an expensive business in terms of people cost, media, time, and storage. The sad fact is that most of the data held in user mailboxes is simply useless in terms of business value and richly deserves to be consigned to the byte wastebasket as soon as it’s sent. Think of the interminable to-and-fro interchanges between users discussing the issues of the day (often badly, never insightful, usually dreadful) coupled with the great dross of read receipts, NDRs, calendar acceptances, and all the other rubbish that clog up mailboxes. None of this needs to be retained but all of it is lovingly preserved on backup tapes.
Some companies attempt to address the problem of data preservation by decreeing that users should delete all email after a month. The efforts of these companies are invariably sadly undermined by the single salient fact that users can create and populate PSTs to their heart’s content unless administrators exert control over PSTs through GPOs. Even then, users are very good at circumventing the best attempts of administrators to force them to do anything that a user doesn’t want to and few companies have ever succeeded in implementing a watertight retention policy that users comply with all of the time.
In fact, if a user is intent on committing some sort of offence, they will take steps to remove all traces of their actions from their online mailbox and will store anything they want to keep in a PST, safely tucked away from the view of the administrator and immune from discovery. Or, if they need to keep a copy of email “just in case”, a user might print it off and keep it safely hidden in a file cabinet. The advent of litigation hold in Exchange 2010 helps, but only after the point where an administrator runs Set-Mailbox -LitigationHoldEnabled $True (on an Exchange 2010 server). Everything else before the hold is established is forgotten, if not forgiven.
Other companies cope by saying that all mail should be retained and deploy sophisticated archiving systems that “scrape and stub” to remove messages into the archive and leave stubs pointing to the archive behind in the mailbox. Effective, but costly and prone to the same effects of user-driven movement of sometimes important information into PSTs. And of course, keeping data is a double-edged sword because if the data is available it can be discovered and is therefore both a potential defense and problem for a company. In fact, you can argue that having an online archive is far worse for a company than having the data available on backups because backup tapes tend to be recycled after a set period (30, 60, or whatever number of days seems appropriate) after which the data is inaccessible. On the other hand, an online archive remains fresh and discoverable by lawyers who care to look unless you deploy retention policies to remove items automatically after they reach a certain age.
To a certain degree, things are much easier now, especially with the new litigation hold and discovery search features introduced in Exchange 2010. PSTs remain the great unwashed (or rather unwanted) of the discovery fraternity as their existence (and the data that they hold) is usually outside the immediate control of central administrators. The good news is that I hear of some third-party software vendors who are busy figuring out solutions to the discovery and control of PST content through intelligent agents that can seek PSTs – even those located on PC local drives – and apply policies to the items held in the PSTs, including the ability to delete items over a certain age or move them into archive mailboxes on a server. We’ll see how these solutions work when they are released and are used in the harsh light of production!
Given all of this, it’s interesting to hear that Microsoft IT has eliminated backups in favour of running four database copies within a Database Availability Group configured with a 30-day single item recovery (SIR) period. In other words, Microsoft has deployed sufficient database copies within its DAGs to eliminate the need for backups that would traditionally protect against hardware and software failure. Features such as single page patching take care of page-level corruption and block-mode replication means that database copies are as up-to-date as possible. The 30 day SIR period allows users to recover items that they have deleted in error without resorting to frantic appeals to administrators to restore backup tapes. After 30 days users are plumb out of luck, but let’s face it, if you “remember” that you deleted something important after 31 days, that item might just not be as important as you think. Interestingly, Microsoft IT has eliminated the use of lagged database copies because they don’t see the value of these copies.
The approach taken by Microsoft IT won’t meet everyone’s needs. It’s too stark for some, too cutting-edge for others. It’s also true that it is tremendously easy to embark on such a radical approach when you have the collective wisdom of both the Windows and Exchange development groups close at hand should anything go wrong. However, the sheer fact that Microsoft IT uses this mechanism is sufficient to provoke the question whether others should do the same – or use a modified version of the approach. After all, Microsoft IT doesn’t always get it right and sometimes their deployment techniques are artistic rather than practical (for others). Anyone remember the seven-node WolfPack cluster that Microsoft deployed with Exchange 2003 where four nodes were active, two used for administrative activities, and one for backup? How many other similar deployments occurred: zero. How many people wanted to do the same: many… Makes you think!
The bottom line is that backup strategy deserves to be reviewed as technology evolves. It needs to be efficient, effective, and meet the company’s business needs. Simply keeping to a tried and trusted approach may give administrators a warm feeling that their data is being protected, but it’s possibly not the best way to proceed in a world where Exchange 2010 SP1 is available.
– Tony
Want to read more compelling and insightful information about Exchange 2010 SP1 – well, get yourself a copy of Microsoft Exchange Server 2010 Inside Out, also available at Amazon.co.uk
. The book is also available in a Kindle edition
. Other e-book formats for the book are available from the O’Reilly web site. And if you want to argue the case in person, come along to one of the Exchange 2010 Maestro Seminars that we’re running in 2011. Your brain may be fried, but you’ll have fun.