Quantcast
Channel: Practical 365
Viewing all articles
Browse latest Browse all 546

Exchange Server 2013 Autoreseed in Action

$
0
0

At my IT/Dev Connections session in September I advocated for simple designs for database availability groups. This included some points about Exchange Server 2013 storage design and layout, such as:

  • JBOD vs RAID
  • Multiple databases per volume
  • Volumes mounted in folders not drive letters
  • Co-locate the database and transaction log files on the same volume

Those recommendations came with caveats of course, depending on various factors. Aside from simple designs providing ease of management they can also mean you get to leverage the terrific new feature in Exchange Server 2013 called Autoreseed.

Autoreseed in Exchange Server 2013

With Autoreseed the members of an Exchange 2013 DAG are pre-configured with one or more spare volumes. When a disk fails the Exchange server is able to automatically replace the failed disk with a spare, and then reseed the lost database copies to the new volume.

This means the recovery workflow in Exchange 2013 goes like this:

  1. Disk fails (resiliency of your DAG is impacted)
  2. Spare disk automatically mounted
  3. Database copies reseeded (resiliency is restored automatically)
  4. Manual intervention to replace failed disk replaced with a new spare

In Exchange Server 2010, which also supported JBOD storage, the recovery workflow goes like this:

  1. Disk fails (resiliency of your DAG is impacted)
  2. Manual intervention to replace disk
  3. Manual intervention to reseed database copies (resiliency is restored)

The Exchange 2010 recovery workflow involves too many manual steps to restore the resiliency of the DAG, requires response by admins at any hour of the day, and is simply not efficient at scale.

The Exchange 2013 recovery workflow can automatically restore the resiliency of the DAG without manual intervention, requires response by admins at a lower urgency, and is far more efficient at scale.

Laying the foundation for Autoreseed involves implementing those recommendations I mentioned earlier. Let’s take a look at them in a little more detail.

RAID vs JBOD

For single datacenter deployments:

  • Always use RAID for the system/OS volume
  • Always use RAID when there are less than 3 database copies
  • Use JBOD when there are 3 or more database copies
  • Use JBOD for lagged copies only when 2 or more lagged database copies exist

For multiple datacenter deployments:

  • Always use RAID for the system/OS volume
  • Always use RAID when there are less than 2 database copies in a datacenter
  • Use JBOD when 2 or more database copies exist in a datacenter
  • Use JBOD for lagged copies as long as 2 or more lagged copies exist, or log play down is enabled

Multiple Databases Per Volume

Use multiple databases per volume when 3 or more database copies exist. Can be placed on RAID or JBOD (with preference for JBOD as I’ll explain shortly).

The number of databases per volume should equal the number of copies of the databases.

Volumes Mounted in Folders not Drive Letters

Mounting your volumes as drive letters is fine for non-DAG deployments, and works for DAG deployments as well, but is not recommended.

There is the obvious limitation of the size of the alphabet. With only 23 usable letters after A:, B:, and C: are consumed, and Exchange 2013 Enterprise capable of hosting 100 databases, you can easily run into problems or at the very least find yourself juggling a complex configuration to work around it.

Instead mount your volumes as folders, using a RAID-protected host volume (the C:\ volume for system/OS is fine for this).

Co-Locate Database and Transaction Log Files

Exchange admins are used to placing the database and transaction log files on separate volumes for recoverability from disk failures. This is still recommended for non-DAG scenarios.

For DAG scenarios the fact that you have multiple copies of each database mitigates the risk of a single disk failure taking out an entire database. So co-locating the database and transaction log files is recommended for DAG scenarios, especially when using multiple databases per volume, and also when using JBOD.

Combining the above, along with evenly distributed active, passive and lagged database copies, gives you an Exchange 2013 DAG that looks similar to this example.

Example Exchange 2013 DAG

This example obviously assumes that a four node DAG in two datacenters is the right solution for the environment. Your own requirements will vary of course, but this example is being used mainly to demonstrate Autoreseed.

Example Storage Layout for DAG Members

With all of the above in mind here is an example of how the storage layout would be configured for an Exchange 2013 DAG member.

We start with a RAID protected system/OS volume, and create two folders in the root of C:\.

  • ExchangeDatabases
  • ExchangeVolumes

These match up with the default settings of an Exchange 2013 DAG for root folder paths.

[PS] C:\>Get-DatabaseAvailabilityGroup | fl *autodag*path
AutoDagDatabasesRootFolderPath : C:\ExchangeDatabases
AutoDagVolumesRootFolderPath   : C:\ExchangeVolumes

autodagdisks1

Next, the volumes that will be hosting the databases and log files are configured. For this simple example a single volume is being configured to host active data and a single volume is being configured as a spare. These are mounted into sub-folders of C:\ExchangeVolumes named Volume1 and Volume2.

autodagdisks2

Volume1 is then mounted into additional folders for hosting the database and log files. These folder names match the names of the databases in the DAG, for example DB01, DB02, DB03 and DB04. These are created as sub-folders of the C:\ExchangeDatabases folder.

autodagdisks3

If you’re wondering what I mean by this, all I am referring to is mounting the volume into multiple paths instead of as a drive letter, just as you would normally see when first creating the volume.

autodagdiskmounts

Finally, create sub-folders of each database folder to host the DB and log files. These are named according to the database names again, so DB01 needs sub-folders named DB01.db and DB01.log.

autodagdisks4

These folders are then used as the paths when creating the mailbox databases themselves. For example, here are the paths for DB01 in this environment.

[PS] C:\>Get-MailboxDatabase DB01 | fl *path*
EdbFilePath             : C:\ExchangeDatabases\DB01\db01.db\DB01.edb
LogFolderPath           : C:\ExchangeDatabases\DB01\db01.log

Autoreseed in Action

When a disk fails in an Exchange Server 2013 DAG member the Autoreseed workflow begins. However, the following conditions must be met for Autoreseed to take place:

  1. The database copies are not blocked from resuming replication or reseeding.
  2. The logs and databases files for the database are collocated on the same volume.
  3. The logs and database folder structure matches the naming convention required for Autoreseed.
  4. There are no other database copies on the volume that are in an “Active” state.
  5. All database copies on the volume are in a “FailedAndSuspended” state.
  6. The server has no more than 8 “FailedAndSuspended” database copies.

If those conditions are met then Autoreseed can attempt to resolve the issue.

The workflow begins with detection of the failed volume. Database copies are regularly checked to see whether any of them have been at a status of “FailedAndSuspended” for 15 minutes or longer. This is the state that a database copy will be in when there is an underlying storage issue. The 15 minute threshold exists to ensure that remedial action is not taken too quickly.

Log Name:      Microsoft-Exchange-HighAvailability/Seeding
Source:        Microsoft-Exchange-HighAvailability
Date:          2/09/2014 10:19:46 PM
Event ID:      1109
Task Category: Auto Reseed Manager
Level:         Information
Keywords:      
User:          SYSTEM
Computer:      MELEX1.exchange2013demo.com
Description:
Automatic Reseed Manager is starting repair workflow 'FailedSuspendedCopyAutoReseed' for database 'DB01'.
WorkflowLaunchReason: Database copy 'DB01\MELEX1' encountered an error during log replay. Error: The system cannot find the path specified

The server attempts to resume the FailedAndSuspended database copy 3 times.

Log Name:      Microsoft-Exchange-HighAvailability/Seeding
Source:        Microsoft-Exchange-HighAvailability
Date:          2/09/2014 10:19:46 PM
Event ID:      1124
Task Category: Auto Reseed Manager
Level:         Information
Keywords:      
User:          SYSTEM
Computer:      MELEX1.exchange2013demo.com
Description:
Automatic Reseed Manager is beginning attempt number 1 of execution stage 'Resume' for database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed'.
WorkflowLaunchReason: Database copy 'DB01\MELEX1' encountered an error during log replay. Error: The system cannot find the path specified
Log Name:      Microsoft-Exchange-HighAvailability/Seeding
Source:        Microsoft-Exchange-HighAvailability
Date:          2/09/2014 11:04:46 PM
Event ID:      1119
Task Category: Auto Reseed Manager
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      MELEX1.exchange2013demo.com
Description:
Automatic Reseed Manager failed to resume database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed' after a maximum of 3 attempts. The workflow will next attempt to assign a spare volume and reseed the database copy.
WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.

The server attempts to assign a spare volume once per hour for up to 5 attempts.

Log Name:      Microsoft-Exchange-HighAvailability/Seeding
Source:        Microsoft-Exchange-HighAvailability
Date:          2/09/2014 11:04:46 PM
Event ID:      1124
Task Category: Auto Reseed Manager
Level:         Information
Keywords:      
User:          SYSTEM
Computer:      MELEX1.exchange2013demo.com
Description:
Automatic Reseed Manager is beginning attempt number 1 of execution stage 'AssignSpare' for database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed'.
WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.
Log Name:      Microsoft-Exchange-HighAvailability/Seeding
Source:        Microsoft-Exchange-HighAvailability
Date:          2/09/2014 11:04:46 PM
Event ID:      1125
Task Category: Auto Reseed Manager
Level:         Information
Keywords:      
User:          SYSTEM
Computer:      MELEX1.exchange2013demo.com
Description:
Automatic Reseed Manager has successfully assigned spare volume '\\?\Volume{6e77b6f8-6f83-49f1-ae48-60aa9419cd19}\' mounted at 'C:\ExchangeVolumes\Volume3\' for database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed'. The workflow will next attempt to reseed the database copy.
WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.

The server attempts to reseed the database copies to the new volume, with up to 5 attempts at 1 hour intervals.

Log Name:      Microsoft-Exchange-HighAvailability/Seeding
Source:        Microsoft-Exchange-HighAvailability
Date:          2/09/2014 11:19:46 PM
Event ID:      1117
Task Category: Auto Reseed Manager
Level:         Information
Keywords:      
User:          SYSTEM
Computer:      MELEX1.exchange2013demo.com
Description:
Automatic Reseed Manager throttled repair workflow 'FailedSuspendedCopyAutoReseed' for database 'DB01'. Details: The Automatic Reseed Manager encountered an error: The automatic repair operation for database copy 'DB01\melex1' will not be run because it has been throttled by the throttling interval of '01:00:00'.
WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.
Log Name:      Microsoft-Exchange-HighAvailability/Seeding
Source:        Microsoft-Exchange-HighAvailability
Date:          3/09/2014 12:10:17 AM
Event ID:      826
Task Category: Seeding Target
Level:         Information
Keywords:      
User:          SYSTEM
Computer:      MELEX1.exchange2013demo.com
Description:
DB Seeding has completed for the local copy of database 'DB01' (1b3363f6-7f82-41ca-953b-2c295c1896a9).

If the process was not successful after 5 attempts, it stops.

After 3 days, if the database copies are still “FailedAndSuspended”, the workflow begins again.

Summary

As you can see Autoreseed is quite intelligent and effective, resolving a straight-forward issue like storage failure with no manual intervention by the administrator except for replacing the failed disk with a new spare.

Just how good is Autoreseed?

In my test lab I tend to treat my servers pretty rough. To test Autoreseed I would regularly open up Server Manager on a DAG member and offline one of the volumes hosting database copies. Then I would go away and do something else for an hour or two.

Every single time Autoreseed successfully restored the resiliency of my DAG. Looking at the event logs it typically achieves this in a little over an hour. In the real world if there are delays or retries on some of the Autoreseed workflow steps, or the databases are larger and take longer to reseed, then it may take longer to recovery but I would have full confidence that it would work.

Autoreseed is a feature of a highly intelligent server application that is designed to run efficiently at scale. As with many features in Exchange Server 2013 to take full advantage of Autoreseed you design for *simpler* DAGs. This is counter-intuitive for some people who are used to adding complexity to their designs to make them more resilient.

But as you can see, by getting the right foundations in place you can easily to take advantage of the benefits of Autoreseed in your deployment.


This article Exchange Server 2013 Autoreseed in Action is © 2014 ExchangeServerPro.com

Get more Exchange Server tips at ExchangeServerPro.com

     

Viewing all articles
Browse latest Browse all 546

Trending Articles