On April 29, I reported the poor support experience I had received as a result of the upgrade of my Office 365 tenant domain from the Wave 14 release to Wave 15. Essentially, a support call reported on April 8 had produced zero progress, despite many messages to and fro between myself and Microsoft’s Office 365 support team. All in all, it was a tiring and frustrating time.
Four hours after posting the report, I was contacted by a UK-based Microsoft escalation engineer. Coincidences do happen, but in this case I think that the public protest had the desired effect on Microsoft’s bunged up support processes. In fact, it’s depressing that posting a blog produced an escalation because it points to a problem in the support process. Normal customers who don’t blog won’t get the same response. It is probable that my visibility within the Exchange community as someone who writes extensively on the topic also assisted in the escalation process.
The good news is that at 22:30 on April 30 Outlook informed me that it had to restart because of a change made by the administrator (in fact, Outlook forced me to restart 3 times, for a reason that I haven’t quite figured out). I logged into my tenant and discovered that OWA used the Wave 15 interface and that all the administrative functions worked as expected. ActiveSync and EWS clients connected flawlessly to the upgraded service. The problem was solved 22 days after being first reported.
What have I learned from the experience? Here are some thoughts:
Microsoft front-line staff are just a filter. No surprise here because all major support organizations use front-line staff to filter incoming calls, solve the most obvious (and some that are not), and pass a certain percentage to second-level support via an escalation process. What surprised me about this case was how long Microsoft allowed the call to remain at the first level despite frequent communication back and forth with me. I asked repeatedly for updates but nothing happened. Clearly the internal escalation process did not function properly.
Microsoft escalation engineers know their stuff (at least, the person I dealt with did). Once the case was escalated things happened more quickly (as you’d expect). The focus was sharper, the questions more pertinent, and action occurred. Tools such as those described in KB2598970 collected information from my workstation to help detect the source of the problem. Communications were restrained and content rich. All in all, a much better experience.
Expect a delay if something has to change in the datacenter. Second level support can go so far with massive cloud systems. Their role seems to be to investigate problems, collect information, and then figure out what needs to be done. In this case a change needed to be made to my tenant domain. Unlike what might happen in an on-premises situation, senior support staff cannot take actions to user accounts (or their equivalents) because Office 365 is, by necessity, an extremely locked down environment where only specific people can interact with user data under controlled conditions. The upshot is that some delay is built into the system to have information fed back to the datacenter team and for them to respond. I like this because it shows that Microsoft is serious about protecting customer data – no shortcuts are taken to solve problems that might compromise data.
The service keeps on running even when back-end migration problems happen. I reported the problem in April 8 and it was resolved on April 30. Sounds bad. But all clients continued to function properly and access Exchange, Lync, and SharePoint during this period. An end user would not have known that anything was wrong. I think that this must be the situation with many Office 365 issues because if something really does go wrong then huge numbers of people are affected. In this case, a partial migration had resulted in a Wave 15 administration front-end attempting to talk to Wave 14 servers at the back-end. The different protocols involved caused the error. As it turns out, I’m told that the problem originated when my tenant subscription was changed last year and that this has uncovered a problem that Microsoft will now fix.
Document everything. This advice is often given to people who experience the joys of reporting a problem to support. You have to know and record your facts because you will be asked about them. Facts help identify where the problem might lie and how it might be solved. Write everything down, including the details of the interactions with the support team (time, date, and duration) as you might need to use this data to force an escalation.
The bottom line is that my Office 365 tenant domain is now back to full health. I am genuinely surprised that it took so long for Microsoft to solve the problem but am glad that things eventually worked out. It’s just a pity that it took so long to resolve and that escalation only happened after the incident was exposed to the full glare of publicity.
I doubt that many other tenant domains will be in the same situation. Office 365 has not really been around long enough for many companies to switch subscription types and Microsoft is now aware of the issue and will fix it. But I sure hope that the folks who run Office 365 support take action to improve their escalation processes so that other customers do not experience the same kind of extended case resolution as occurred here.
Follow Tony @12Knocksinna
Update 2 May: I was called this morning by a Microsoft customer support manager to discuss the problem and how Microsoft worked as the issue unfolded. I thought that the discussion was very open and helpful, which is always a good thing.