At my IT/Dev Connections session in September I advocated for simple designs for database availability groups. This included some points about Exchange Server 2013 storage design and layout, such as:
- JBOD vs RAID
- Multiple databases per volume
- Volumes mounted in folders not drive letters
- Co-locate the database and transaction log files on the same volume
Those recommendations came with caveats of course, depending on various factors. Aside from simple designs providing ease of management they can also mean you get to leverage the terrific new feature in Exchange Server 2013 called Autoreseed.
Autoreseed in Exchange Server 2013
With Autoreseed the members of an Exchange 2013 DAG are pre-configured with one or more spare volumes. When a disk fails the Exchange server is able to automatically replace the failed disk with a spare, and then reseed the lost database copies to the new volume.
This means the recovery workflow in Exchange 2013 goes like this:
- Disk fails (resiliency of your DAG is impacted)
- Spare disk automatically mounted
- Database copies reseeded (resiliency is restored automatically)
- Manual intervention to replace failed disk replaced with a new spare
In Exchange Server 2010, which also supported JBOD storage, the recovery workflow goes like this:
- Disk fails (resiliency of your DAG is impacted)
- Manual intervention to replace disk
- Manual intervention to reseed database copies (resiliency is restored)
The Exchange 2010 recovery workflow involves too many manual steps to restore the resiliency of the DAG, requires response by admins at any hour of the day, and is simply not efficient at scale.
The Exchange 2013 recovery workflow can automatically restore the resiliency of the DAG without manual intervention, requires response by admins at a lower urgency, and is far more efficient at scale.
Laying the foundation for Autoreseed involves implementing those recommendations I mentioned earlier. Let’s take a look at them in a little more detail.
RAID vs JBOD
For single datacenter deployments:
- Always use RAID for the system/OS volume
- Always use RAID when there are less than 3 database copies
- Use JBOD when there are 3 or more database copies
- Use JBOD for lagged copies only when 2 or more lagged database copies exist
For multiple datacenter deployments:
- Always use RAID for the system/OS volume
- Always use RAID when there are less than 2 database copies in a datacenter
- Use JBOD when 2 or more database copies exist in a datacenter
- Use JBOD for lagged copies as long as 2 or more lagged copies exist, or log play down is enabled
Multiple Databases Per Volume
Use multiple databases per volume when 3 or more database copies exist. Can be placed on RAID or JBOD (with preference for JBOD as I’ll explain shortly).
The number of databases per volume should equal the number of copies of the databases.
Volumes Mounted in Folders not Drive Letters
Mounting your volumes as drive letters is fine for non-DAG deployments, and works for DAG deployments as well, but is not recommended.
There is the obvious limitation of the size of the alphabet. With only 23 usable letters after A:, B:, and C: are consumed, and Exchange 2013 Enterprise capable of hosting 100 databases, you can easily run into problems or at the very least find yourself juggling a complex configuration to work around it.
Instead mount your volumes as folders, using a RAID-protected host volume (the C:\ volume for system/OS is fine for this).
Co-Locate Database and Transaction Log Files
Exchange admins are used to placing the database and transaction log files on separate volumes for recoverability from disk failures. This is still recommended for non-DAG scenarios.
For DAG scenarios the fact that you have multiple copies of each database mitigates the risk of a single disk failure taking out an entire database. So co-locating the database and transaction log files is recommended for DAG scenarios, especially when using multiple databases per volume, and also when using JBOD.
Combining the above, along with evenly distributed active, passive and lagged database copies, gives you an Exchange 2013 DAG that looks similar to this example.
This example obviously assumes that a four node DAG in two datacenters is the right solution for the environment. Your own requirements will vary of course, but this example is being used mainly to demonstrate Autoreseed.
Example Storage Layout for DAG Members
With all of the above in mind here is an example of how the storage layout would be configured for an Exchange 2013 DAG member.
We start with a RAID protected system/OS volume, and create two folders in the root of C:\.
- ExchangeDatabases
- ExchangeVolumes
These match up with the default settings of an Exchange 2013 DAG for root folder paths.
[PS] C:\>Get-DatabaseAvailabilityGroup | fl *autodag*path AutoDagDatabasesRootFolderPath : C:\ExchangeDatabases AutoDagVolumesRootFolderPath : C:\ExchangeVolumes
Next, the volumes that will be hosting the databases and log files are configured. For this simple example a single volume is being configured to host active data and a single volume is being configured as a spare. These are mounted into sub-folders of C:\ExchangeVolumes named Volume1 and Volume2.
Volume1 is then mounted into additional folders for hosting the database and log files. These folder names match the names of the databases in the DAG, for example DB01, DB02, DB03 and DB04. These are created as sub-folders of the C:\ExchangeDatabases folder.
If you’re wondering what I mean by this, all I am referring to is mounting the volume into multiple paths instead of as a drive letter, just as you would normally see when first creating the volume.
Finally, create sub-folders of each database folder to host the DB and log files. These are named according to the database names again, so DB01 needs sub-folders named DB01.db and DB01.log.
These folders are then used as the paths when creating the mailbox databases themselves. For example, here are the paths for DB01 in this environment.
[PS] C:\>Get-MailboxDatabase DB01 | fl *path* EdbFilePath : C:\ExchangeDatabases\DB01\db01.db\DB01.edb LogFolderPath : C:\ExchangeDatabases\DB01\db01.log
Autoreseed in Action
When a disk fails in an Exchange Server 2013 DAG member the Autoreseed workflow begins. However, the following conditions must be met for Autoreseed to take place:
- The database copies are not blocked from resuming replication or reseeding.
- The logs and databases files for the database are collocated on the same volume.
- The logs and database folder structure matches the naming convention required for Autoreseed.
- There are no other database copies on the volume that are in an “Active” state.
- All database copies on the volume are in a “FailedAndSuspended” state.
- The server has no more than 8 “FailedAndSuspended” database copies.
If those conditions are met then Autoreseed can attempt to resolve the issue.
The workflow begins with detection of the failed volume. Database copies are regularly checked to see whether any of them have been at a status of “FailedAndSuspended” for 15 minutes or longer. This is the state that a database copy will be in when there is an underlying storage issue. The 15 minute threshold exists to ensure that remedial action is not taken too quickly.
Log Name: Microsoft-Exchange-HighAvailability/Seeding Source: Microsoft-Exchange-HighAvailability Date: 2/09/2014 10:19:46 PM Event ID: 1109 Task Category: Auto Reseed Manager Level: Information Keywords: User: SYSTEM Computer: MELEX1.exchange2013demo.com Description: Automatic Reseed Manager is starting repair workflow 'FailedSuspendedCopyAutoReseed' for database 'DB01'. WorkflowLaunchReason: Database copy 'DB01\MELEX1' encountered an error during log replay. Error: The system cannot find the path specified
The server attempts to resume the FailedAndSuspended database copy 3 times.
Log Name: Microsoft-Exchange-HighAvailability/Seeding Source: Microsoft-Exchange-HighAvailability Date: 2/09/2014 10:19:46 PM Event ID: 1124 Task Category: Auto Reseed Manager Level: Information Keywords: User: SYSTEM Computer: MELEX1.exchange2013demo.com Description: Automatic Reseed Manager is beginning attempt number 1 of execution stage 'Resume' for database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed'. WorkflowLaunchReason: Database copy 'DB01\MELEX1' encountered an error during log replay. Error: The system cannot find the path specified
Log Name: Microsoft-Exchange-HighAvailability/Seeding Source: Microsoft-Exchange-HighAvailability Date: 2/09/2014 11:04:46 PM Event ID: 1119 Task Category: Auto Reseed Manager Level: Error Keywords: User: SYSTEM Computer: MELEX1.exchange2013demo.com Description: Automatic Reseed Manager failed to resume database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed' after a maximum of 3 attempts. The workflow will next attempt to assign a spare volume and reseed the database copy. WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.
The server attempts to assign a spare volume once per hour for up to 5 attempts.
Log Name: Microsoft-Exchange-HighAvailability/Seeding Source: Microsoft-Exchange-HighAvailability Date: 2/09/2014 11:04:46 PM Event ID: 1124 Task Category: Auto Reseed Manager Level: Information Keywords: User: SYSTEM Computer: MELEX1.exchange2013demo.com Description: Automatic Reseed Manager is beginning attempt number 1 of execution stage 'AssignSpare' for database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed'. WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.
Log Name: Microsoft-Exchange-HighAvailability/Seeding Source: Microsoft-Exchange-HighAvailability Date: 2/09/2014 11:04:46 PM Event ID: 1125 Task Category: Auto Reseed Manager Level: Information Keywords: User: SYSTEM Computer: MELEX1.exchange2013demo.com Description: Automatic Reseed Manager has successfully assigned spare volume '\\?\Volume{6e77b6f8-6f83-49f1-ae48-60aa9419cd19}\' mounted at 'C:\ExchangeVolumes\Volume3\' for database copy 'DB01' as part of repair workflow 'FailedSuspendedCopyAutoReseed'. The workflow will next attempt to reseed the database copy. WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.
The server attempts to reseed the database copies to the new volume, with up to 5 attempts at 1 hour intervals.
Log Name: Microsoft-Exchange-HighAvailability/Seeding Source: Microsoft-Exchange-HighAvailability Date: 2/09/2014 11:19:46 PM Event ID: 1117 Task Category: Auto Reseed Manager Level: Information Keywords: User: SYSTEM Computer: MELEX1.exchange2013demo.com Description: Automatic Reseed Manager throttled repair workflow 'FailedSuspendedCopyAutoReseed' for database 'DB01'. Details: The Automatic Reseed Manager encountered an error: The automatic repair operation for database copy 'DB01\melex1' will not be run because it has been throttled by the throttling interval of '01:00:00'. WorkflowLaunchReason: The Microsoft Exchange Replication service is unable to create required directory C:\ExchangeDatabases\DB01\db01.log for DB01\MELEX1. The database copy status will be set to Failed. Please check the file system permissions. Error: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\ExchangeDatabases\DB01\db01.log'.
Log Name: Microsoft-Exchange-HighAvailability/Seeding Source: Microsoft-Exchange-HighAvailability Date: 3/09/2014 12:10:17 AM Event ID: 826 Task Category: Seeding Target Level: Information Keywords: User: SYSTEM Computer: MELEX1.exchange2013demo.com Description: DB Seeding has completed for the local copy of database 'DB01' (1b3363f6-7f82-41ca-953b-2c295c1896a9).
If the process was not successful after 5 attempts, it stops.
After 3 days, if the database copies are still “FailedAndSuspended”, the workflow begins again.
Summary
As you can see Autoreseed is quite intelligent and effective, resolving a straight-forward issue like storage failure with no manual intervention by the administrator except for replacing the failed disk with a new spare.
Just how good is Autoreseed?
In my test lab I tend to treat my servers pretty rough. To test Autoreseed I would regularly open up Server Manager on a DAG member and offline one of the volumes hosting database copies. Then I would go away and do something else for an hour or two.
Every single time Autoreseed successfully restored the resiliency of my DAG. Looking at the event logs it typically achieves this in a little over an hour. In the real world if there are delays or retries on some of the Autoreseed workflow steps, or the databases are larger and take longer to reseed, then it may take longer to recovery but I would have full confidence that it would work.
Autoreseed is a feature of a highly intelligent server application that is designed to run efficiently at scale. As with many features in Exchange Server 2013 to take full advantage of Autoreseed you design for *simpler* DAGs. This is counter-intuitive for some people who are used to adding complexity to their designs to make them more resilient.
But as you can see, by getting the right foundations in place you can easily to take advantage of the benefits of Autoreseed in your deployment.
This article Exchange Server 2013 Autoreseed in Action is © 2014 ExchangeServerPro.com
Get more Exchange Server tips at ExchangeServerPro.com