On July 13, Microsoft took the decision to withdraw Exchange 2010 SP1 RU4 (the fourth roll-up update for SP1). As you’re aware, a roll-up update (RU) is a regular release of the patches and other fixes that Microsoft has accumulated for a version of Exchange. RU4 was released on June 22, 2011 so it hasn’t been available long. Microsoft released the prior update (RU3) on March 8, 2011 but customers soon encountered problems with Blackberry devices sending duplicate messages. Microsoft then re-released RU3 on April 6, 2011.
In the case of RU4, the problem is a tad surprising because it occurs during a fundamental operation – moving items around. It’s the kind of thing that you’d really expect a QA group to pick up:
“A small number of customers have reported when the Outlook client is used to move or copy a folder that subfolders and content for the moved folder are deleted. After investigation we have determined that the folder and item contents do not appear in the destination folder as expected but may be recovered from the Recoverable Items folder (what was previously known as Dumpster in older versions of Exchange) from the original folder. This behavior occurs due to a customer requested change in SP1 RU4 which allowed deleted Public Folders to be recovered. Outlook and Exchange are not correctly processing the folder move and copy operations causing the folder contents to appear to be deleted.”
It requires a pretty serious event for a development group to publicly withdraw software. To have to withdraw roll-up updates twice in quick succession seems to indicate that Microsoft has a real problem in their quality control or release process and begs the question whether customers should have confidence in future patches or other software released for Exchange.
I think that taking this attitude is a somewhat simplistic view of the situation. Here’s why. First, Exchange is a very complex product that spans over 21 million lines of code. Although I am sure that the development process is well honed after some sixteen or seventeen years of building Exchange, things are becoming more complex all the time as the development group now has to create code to serve the twin platforms of on-premises and cloud (Exchange Online).
Some insight into the complexity that Microsoft development groups deal with might be gained from the excellent books written by Steve McConnell about Microsoft development practices based on his experience of shipping several products, including Rapid Development: Taming Wild Software Schedules, Code Complete: A Practical Handbook of Software Construction and Software Estimation: Demystifying the Black Art (Best Practices). In “Code Complete”, McConnell mentions that there might be 10-20 code defects per 1,000 lines. I believe that this is an old number based on early releases of products such as Excel that has likely decreased with the introduction of automated code checking tools and better software development frameworks, but it’s still probable that every 1,000 lines of code has one or two defects lurking. I can’t believe that the Exchange code base includes over 21,000 bugs, but I bet that Microsoft has a substantial database of known bugs, potential problems, customer requests for enhancements, and other reasons why code might need to be changed in the future. It’s just the nature of complex software.
It’s also important to realize that a specific defect might never be exposed in the normal course of events, might only appear in very specific circumstances, or become a knock-on effect as a result of code changed elsewhere including a Microsoft or non-Microsoft client. I doubt that we will ever get to zero code defects in commercial software so we’re always going to have to cope with patches and service packs for Exchange, Windows, SharePoint et al.
Second, given that we deal with a complex software environment, it makes sense to protect production systems by never deploying roll-up updates, service packs or indeed new versions without testing in a realistic environment that adequately mimics the production workload. In this context, testing doesn’t mean just checking that the software will install. It means testing Exchange on the Windows build used in production accessed by all the clients (and versions) that you use and alongside all third-party software products that interact with Exchange. In short, it’s not a quick and simple process.
If you rush to deploy software as soon as it’s released by Microsoft, you run the risk of encountering a problem that impacts users. For example, if you had deployed RU3 without testing, you’d have to explain to Blackberry users why they were seeing duplicate messages. In the case of RU4, you might have run into the situation where Outlook users report that they had “lost” data when they moved or copied folders. Both situations underline the importance of testing before deployment.
The third factor to consider is the maturity demonstrated by the Exchange development group in quickly acknowledging the problem and taking the necessary action to withdraw the software, even if it exposed Exchange to the ridicule of some commentators. I think this behavior shows a certain dedication to the installed base and so even if I am not utterly impressed at the fact that Microsoft has had to withdraw two roll-up updates in quick succession, the disappointment is somewhat mitigated by their fast action and open communications, allied to an expectation that this situation has served as a wake-up call to the QA and support folks who hopefully will do better with future releases.
And for the rest of us, it’s a great reminder that software like Exchange is general-purpose in that it’s created by engineers who have zero visibility of many varied ways that Exchange is deployed in the field. If only for that reason alone, you should protect yourself against software bugs by testing, testing, and more testing before anything is deployed.
Microsoft plans to fix the problem in Exchange 2010 RU5, which is expected to be available sometime in August. Microsoft has an interim update (KB2581545) that can be applied if you have already deployed RU4 (but remember the requirement for testing). You can contact Microsoft support to get the interim update.
Update July 28: Microsoft has rereleased RU4. See my commentary on WindowsITPro.com.