Exchange 2013: Using crimson events to track transaction log truncation within a DAG


Prior to Exchange 2007, transaction log truncation (removal) used to be a pretty simple affair. A successful good backup would truncate the log set or, if you used circular logging, Exchange used a small set of logs to capture transactions and would reuse the files once the transactions in the logs were committed into the database.

The advent of Cluster Continuous Replication (CCR) in Exchange 2007 (plus its variants, LCR and SCR) and more importantly, the Database Availability Group (DAG) in Exchange 2010 required the way that transaction logs are truncated to change. Exchange 2010 breaks the connection between databases and servers and multiple copies can exist within a DAG. Replication of transaction data between servers complicates matters a tad because of a rule that no data should ever be removed if it might be needed. Transaction logs exist to update databases, so it follows that Exchange never truncates logs if the chance exists that the logs might be required to update a database copy.

Some like to operate databases without circular logging. This is fine, as long as you have sufficient disk space to hold the logs. The old best practice of not using circular logging with production databases has evolved as It is quite safe to use circular logging within a DAG, providing that sufficient copies exist to provide a reasonable guarantee of redundancy. Three copies is good; four is even better. And indeed, once database copies are in use, a different form of circular logging called CRCL is used.

TechNet explains the conditions that must exist before transaction logs are truncated by the Replication service:

  • The log file must have been successfully backed up, or circular logging must be enabled.
  • The log file must be below the checkpoint (the minimum log file required for recovery) for the database.
  • All other lagged copies must have inspected the log file.
  • All other copies (not lagged copies) must have replayed the log file.

In addition, the article also explains that the transaction logs for the active database copy are never truncated when one or more copies are suspended.  Even more care is taken when a lagged database copy is maintained because the transaction logs are retained for the lagged period and cannot be removed until that period expires and the Replication service replays the logs to update the lagged database copy.

All of this is quite clear, but when you look at the accumulation of transaction logs for a database, you might wonder on what basis the Replication service decides to truncate the logs. Because Exchange 2013 uses quite a big checkpoint depth (100MB), it’s usual to find a hundred or more transaction logs even when circular logging is enabled and the database is essentially quiescent. It’s far removed from the five or six transaction logs that a standalone database enabled for circular logging might use.

Truncation occurs when the “LogTruncator”, a component running inside the Replication service, examines the log set to assess what logs must still be retained. This happens on an ongoing basis and when a database is mounted. Some insight into the decision process can be gained by examining the “TruncationDebug” crimson channel in the Event Viewer. Exchange 2010 began to use crimson events to capture important information about high availability processing; Exchange 2013 captures a lot more information. In the screen shot, you can see that TruncationDebug is under the HighAvailability section. Three interesting events provide some insight into what the Replication service does when it examines logs to decide whether they can be truncated. In sequence, the events are:

  • 223: The Replication service gets information from servers that host database copies about what log files they have processed. In the screen shot, the information coming back from the EXSERVER2 server indicates that it wants generation 3557 to be preserved for its database copy. This is the minimum log file required for recovery.
  • 224: The Replication service decides what logs can be truncated.
  • 299: The Replication service truncates the log stream and removes whatever transaction logs are no longer necessary.
Crimson channel event reporting log truncation

Crimson channel event reporting log truncation

In this instance, the database copy existed on only two servers (definitely not too redundant) and so the Replication service only had to take into account input from the two servers. One (as we’ve seen) advised that it needed generation 3557, so to be safe the Replication Service truncated to generation 3555.

Dumping a transaction log header to check its generation

Dumping a transaction log header to check its generation

On a practical level, any transaction logs belonging to generation 3554 or below are removed from the server, so when we look at the transaction log directory, we should see logs for generations 3555 and higher. Of course, Exchange uses a hex numbering scheme for transaction logs, so the log number is DE3, with a full file name of E000000DE3.log. You can validate the generation number for a transaction log by running ESEUTIL with the /ML switch as shown above.

Exchange takes enormous care to make sure that transaction logs are retained until they are no longer required. Given that DAGs can vary so much in construction from simple 2-member implementations right up to the sixteen-member mega-DAGs, it’s obvious that log truncation can be a tricky business. Fortunately the technology works very well in the background. I wish that the same statement could always be made about technology…

Follow Tony @12Knocksinna

About Tony Redmond

Lead author for the Office 365 for IT Pros eBook and writer about all aspects of the Office 365 ecosystem.
This entry was posted in Exchange 2013 and tagged , , , , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.