Varying replication mode within a DAG

Exchange, Exchange 2010

Varying replication mode within a DAG

Published by

Tony Redmond

on

March 16, 2011

Log shipping is a well-known method for data replication between Exchange servers (and other computers). It made its first appearance in Exchange in the Local Continuous Replication (LCR) and Cluster Continuous Replication (CCR) features of Exchange 2007, added to with Standby Cluster Replication (SCR) in Exchange 2007 SP1, and is used to distribute replicated data within a Database Availability Group (DAG) in Exchange 2010 and Exchange 2010 SP1.

The basic arrangement for log shipping is simple: a transaction log file is generated on a source server and is either pulled by the target server (Exchange 2007) or pushed to the servers that contain database copies (Exchange 2010). In either case, it is the Microsoft Exchange Replication Service that is responsible for transferring data. The difference between the two methods is accounted for by the fact that Exchange 2007 only supports a single database copy for LCR, CCR, or SCR while Exchange 2010 supports up to sixteen database copies within a DAG.

The problem with depending on files is that losing one can lead to data loss. An Exchange transaction log holds 1MB of data composed of interleaved transactional steps generated by client activity. A single complete transaction, for example the creation of a new mail message, is composed of several steps as the new item is initiated, populated with data, and finally committed. If you lose a transaction log, all of the transactions in the log are obviously unavailable and while the affected items might be relatively unimportant (who will miss yet another auto-reply), they might be ultra-critical, such as a message from the CEO about an important acquisition. It therefore makes sense to minimize the risk of losing any transactional data in whatever way is possible, which is the logic for the introduction of block-mode replication in Exchange 2010 SP1.

Shipping complete transaction logs around is referred to as file-mode replication. Exchange 2010 servers always commence replication in this mode. However, from Sp1 onwards, DAG member servers are able to switch into block-mode replication if replication is proceeding smoothly within the DAG and no copy or replay queues are accumulating on the DAG members.

Block-mode replication means that the server that holds the active copy of a database will push data to the servers that hold the passive copies of the database as soon as data for a new transaction is written into the log buffer. The log buffer is an in-memory cache that holds current transaction data. After 1MB of data is accumulated, the log buffer is flushed to create a transaction log. Obviously, this process still continues as it’s critical to continue to capture transactions in a way that they can be replayed should servers crash and memory be erased.

Switching between modes is automatic and is managed by a component called the log copier that monitors the copy and replay queue lengths as transaction logs are generated. If queues start to build, the log copier will switch back into file-mode replication and remain in that state until conditions ease and the queues clear.

How do you know what’s happening on a server? The following PowerShell command interrogates the Windows Performance Monitor counters that are maintained by the MsExchangeRepl process.

Get-Counter -ComputerName ExServer1 -Counter “\MSExchange Replication(*)\Continuous replication – block mode Active”

Timestamp CounterSamples --------- -------------- 3/16/2011 10:18:11 AM \\exserver1\\msexchange replication(db2)\continuous replication – block mode active : 0 \\exserver1\\msexchange replication(db1)\continuous replication – block mode active : 0 \\exserver1\\msexchange replication(db4)\continuous replication – block mode active : 0 \\exserver1\\msexchange replication(db3)\continuous replication – block mode active : 0 \\exserver1\\msexchange replication(_total)\continuous replication – block mode active : 0

We can see that a separate counter is maintained for each database on the server plus an overall counter. In this case, we can see that there are four databases (DB1, DB2, DB3, and DB4). The value of each counter is 0 (zero), so we know that this server is currently operating in file-mode replication for each of these databases. A value of 1 (one) indicates block-mode replication is active. Of course, you can also look at these counters through Performance Monitor, but that’s pretty boring as the values don’t change that often.

Another method is by using the Get-WMIObject cmdlet. The same data is interrogated. In this example (modified version of code taken from MSDN), we want to report any instance of a database on a specified server (ExServer1) where block-mode replication is currently active.

Get-WMIObject -ComputerName ExServer1 Win32_PerfRawData_MSExchangeReplication_MSExchangeReplication | Where-Object {$_.ContinuousReplicationBlockModeActive -eq "1"} | Where-Object {$_.name -ne "_total"} |Format-table Name, ContinuousReplicationBlockModeActive -AutoSize

Name ContinuousReplicationBlockModeActive ---- ------------------------------------ db2 1 db1 1 db4 1 db3 1
It is possible that you’d never see block-mode replication in action. Running two virtualized Exchange servers on a laptop is an exercise in slow disk I/O and queues form rapidly during heavy activity such as mailbox moves or mailbox imports. The same might be true for stressed mailbox servers. In these circumstances Exchange will play safe and remain in file-mode replication mode. It’s also possible that block-mode replication will be possible to one server and not another, again because one of the server is stressed and copy or replay queues have accumulated there.

The point about block-mode replication is that it enables data to be transferred from the active database to its passive copies much faster than if the DAG has to wait for complete transaction logs. In the case of heavily loaded servers that are generating multiple transaction logs every second, the difference might be relatively small in time as measured by humans, but every millisecond counts in a crash.

When data is transferred from the active server to a server holding a passive copy, it is stored in the log buffer on the receiving server and becomes part of the transaction stream that will be processed by that server. Another improvement in SP1 is that if a crash occurs during block-mode replication that prevents the contents of a complete transaction log being received, the receiving server is able to close off the incomplete log and use its contents during the activation process to bring the selected database copy to a point that is as close to up-to-date as possible.

All in all, this is very nice work and evidence of growing maturity in Exchange high availability technology.

– Tony

For more information about how things work within a DAG, see chapter 8 of Microsoft Exchange Server 2010 Inside Out, also available at Amazon.co.uk and in a Kindle edition. Other e-book formats for the book are available from the O’Reilly web site.

8 responses to “Varying replication mode within a DAG”

Paul Bendall

March 16, 2011

Tony,

Great write-up of this new technology, I’ve shared it with all my colleagues to aide understanding. Just to confim in my own mind:

“Block-mode replication means that the server that holds the active copy of a database will push data to the servers that hold the passive copies of the database as soon as data for a new transaction is written into the log buffer”

So does this mean that if a transaction is small i.e a few KB it will imddeiately be sent to the passive copies even though it is much smaller than the buffer or a transaction log?

Reply
1. Tony Redmond (“Thoughts of an Idle Mind”)
  
  March 16, 2011
  
  Yes, as soon as a transaction is committed into the log buffer, it is copied to the other servers.
  
  Reply
  1. Paul Bendall
    
    March 16, 2011
    
    Thanks Tony
Justin

March 17, 2011

Tony – thanks for this. One thing that I have been trying to understand while reading your book is the behavior of Outlook Anywhere and Autodiscover. So internal clients in the same site as the Exchange box will pull down the profile setting which will be configured to use the external name of the Exchange server and set them up like you would with an RPC/HTTP client. My thing is doesn’t it seem redundant to have Outlook go outside onl to come back in to get to the Exchange server? I suppose you can do a split DNS to remedy this but is there something that Outlook does to automatically detect it is in the same location as the Exchange server and connects directly to it? Thanks

Reply
1. Tony Redmond (“Thoughts of an Idle Mind”)
  
  March 17, 2011
  
  Does http://technet.microsoft.com/en-us/library/bb124251.aspx help?
  
  Basically inside the firewall, the Outlook client queries Active Directory for the SCP objects associated with AutoDiscover and creates a list – and then picks a local SCP (same AD site) from the list and uses it. Outlook actually refreshes its set of URLs using the SCPs every 60 minutes so this is a query that keeps on going…
  
  Reply
  1. Justin
    
    March 21, 2011
    
    Ah… Ok, I see it. Starts @ Page 663. You da man as always!
Anand Kumar Deva

March 28, 2011

Thanks a Ton Tony. Well write up. High availability is really maturing. I believe there will be time when we don’t think about using ESEUTIL!

Reply
Tony Redmond (“Thoughts of an Idle Mind”)

March 28, 2011

I haven’t thought about using ESEUTIL for a very long time. I certainly don’t see the sense of using ESEUTIL to rebuild a database unless I was backed into a corner and had no other way of rescuing a database and restoring it to good health…

Reply