SRM Tutorial Part 8: Configuration #2

With the end of the last part the SRM lab environment is now ready to use. This part of the tutorial shows you the last steps of the SRM configuration, which are required to run a failover between the two sites.


SRM Tutorial Part 1: Lab setup
SRM Tutorial Part 2: Components & design
SRM Tutorial Part 3: SRM installation
SRM Tutorial Part 4: NetApp Ontap Simulator – Setup & configuration
SRM Tutorial Part 5: Configure NetApp SnapMirror
SRM Tutorial Part 6: SRA installation
SRM Tutorial Part 7: Configuration #1
SRM Tutorial Part 8: Configuration #2
SRM Tutorial Part 9: Advanced configuration & troubleshooting

Protection Groups:

First we start with the protection groups. A protection groups consist out of one or more datastores with VMs to protect. Depending on the environment there can be different approaches how the design of these groups looks like. You should always keep in mind that a single datastore can just be part of one protection group. Let´s assume you have 10 datastore and put all these into a single protection group. This configuration wouldn´t give you any flexibility regarding the recovery plans we are going to create later in this post. You could only failover all datastores together. The opposite approach would be, if you would create the protection groups with a 1:1 relationship to your amount of datastores. You would have the highest flexibility, but also the highest configuration effort (especially in large environments). Mostly a balance between these two approaches will fit to the most environments. I think this will become more clear to you if follow the next steps.

Select the category Protection Groups on the left. You see the default folder All Protection Groups. Right-click on it and select Create Protection Group.

01_srm_create_protection_group

First you need to select the site, which holds the VMs you want to protect. In an environment with multiple storage arrays-pairs you would also need to specify the one which provides the relevant datastore.

02_srm_create_protection_group

Next you see all replicated datastores on the selected array. In my lab I take the approach to create a single protection group for each datastore, so I just select only one here. On the bottom you see the VMs, which reside on the selected datastore and you are going to protect.

03_srm_create_protection_group

Finally you need to specify a name. I would recommend you to build a relationship between the name and the datastores, which are included in this protection group.

04_srm_create_protection_group

After you created the protection group you will see in the “Recent Tasks” bar that there is something going on. SRM now creates the placeholder files for the VMs you just protected.

06_srm_protect_vms

Take a look now in the inventory of your vCenter at the secondary site. You will notice that your protected VMs appear with a special bolt-icon. Please don´t delete or modify these objects at anytime (exception: troubleshooting).

000_srm_bold_icon

Like mentioned before I created a single protection group for each of my two datastores.

08_both_datastores

Recovery Plans:

The last part of the basic configuration is to build recovery plans before we can start with some failover tests.
A recovery plan configuration includes one or multiple protection groups. A protection group can be part of multiple recovery plans. Also it contains a lot of configurations to the whole recovery process and for each individual proteced VM. You can imagin this as a “run book” for your DR process.
I think nearly in every environment you should have a recovery plan, which includes your whole environment. This means a single plan contains all the protection groups you created before. In case of disaster (complete outage of the primary site) a single plan will recover your infrastructure at the secondary site. Due on fact there can be much more use cases for the use of SRM, it may makes sense to create also multiple “smaller” plans. Maybe you have planned some maintenance work on some parts (may a single storage array out of multiple) of your environment and you want to do a planed failover of just the affected VMs and not the whole environment.

Select the category Recovery Plans on the left. You see the default folder All Recovery Plans. Right-click on it and select Create Recovery Plan.

09_srm_create_recovery_plan

Next you need to select the site, where you want to recover your protected VMs.

10_srm_create_recovery_plan

Include the protection groups you created and click Next. Note that I´m going to create a plan for the whole environment, so I select all my protection groups.

11_srm_create_recovery_plan

You are asked for a test network in the next window. SRM 5 has the great possibility to test a recovery plan. It´s like a simulation of a failover, but of course without any affective to the running primary site. Due on the fact SRM will really power on the protected VMs on the secondary site during this simulation it´s really important to isolate the VMs from a network connectivity perspective. If you leave these settings on default SRM will create a so called bubble network for you. This means it connects the recovered VMs to a newly created vSwitch without any uplinks. May you have some use case where you want to put the VMs into some special VLAN etc. then you could configure this here.

12_srm_create_recovery_plan

Enter a name for your plan and optional a description (which makes a lot of sense in a complex environment).

13_srm_create_recovery_plan

I also created two additional plans to have the possibility to just failover a single datastore of my lab.

14_srm_create_recovery_plan

If you now click on one of your recovery plans on the left you will see a dashboard and a lot of new tabs. At this point I will not go more into detail about all the options you have here. I will show the most useful of them in the next and last part of this tutorial. For now the defaults are just fine.

15_srm_create_recovery_plan

Simulate a failover

Now it´s time to get to the really interesting part…let´s test a recovery plan!
I suggest you to select the tab Recovery Steps before you perform a right-click on your recovery plan and select Test. In this tab you can follow each step SRM is performing in real-time.

18_srm_test_failover_

Before the test is starting you will get a warning to be sure you know what you are doing 🙂 Also you have the possibility here to select a checkbox to synchronize the datastores. If you followed the NetApp parts of this tutorial you hopefully remember that we set the replication to manual / on demand. So it could be a good idea to let SRM initiate a sync before testing the failover.

19_srm_test_failover_

The progress of every single step is shown in the Recovery Steps tab.

20_srm_test_failover_

Make sure you have your vSphere client also open on the secondary site. You will notice a lot of tasks there. In the scenario shown on the screenshot I´m just doing a test failover of 4 VMs. Imagine how many tasks you will see in the bar in a productive environment with some hundreds VMs 🙂

21_srm_test_failover_

You should also take a look into the network configuration of your ESXi host on the secondary site. There you will find the “bubble-network” for the recovered VMs, which I mentioned before.

22_srm_test_failover_

If all steps completed successfully you have now your protected VMs running on the secondary site, while they are still running on the primary site. I think it is important to know how this works in the background. The storage system on the secondary site creates a snapshot of each datastore when a test failover is initiated. This gives you the possibility to have also write access to the datastores on this site. For sure this also means that these snapshots grow depending on the amount of changes you are performing, while your infrastructure stays this condition. Always keep this in your mind when performing a test failover.

To end the test right-click on your recovery plan again and select Cleanup. This will reverse all the steps, which have been executed before.

23_srm_cleanup_test_failover

Again you will see a warning before the cleanup process starts.

24_srm_cleanup_test_failover

Failover to recovery site:

The steps for a real failover to the recovery site are quite similar as the simulation we did before. Instead of Test you must select Recovery when you right-click on the recovery plan.

25_srm_recover

You are getting a warning window again. Here you must explicit select a checkbox, which says you know what you are doing before you can click the Next button. Also you have to choose between a Planned Migration or an Disaster Recovery. While the steps SRM performs are quite the same there is still a mayor difference between these two. An example for a use case of the Planned Migration option would be that you want to us SRM for a planned move of your VMware infrastructure to a new data center on date XY. In this case SRM will do a clean shutdown of your VMs at the primary site, replicate the datastores at the array level and going through all the steps on the secondary site to get everything up and running. In case of a failure SRM will immediately stop its operations to give you the opportunity to correct the error to get a smooth switchover to the secondary site.
Let´s take a look at the Disaster Recovery option: The typical use case would be a complete outage of your primary site. Let´s assume all your ESXi hosts, vCenter and also the storage array are unreachable from one second to the other. If you now select the Disaster Recovery option SRM will try at any price to get your infrastructure up and running at the secondary site. If it´s not possible e.g. to synchronize the datastores SRM will prompt an error but still goes to the next step in the recovery plan.

If you want to use the Planned Migration option I strongly suggest you always to run a test before to make sure you will not run into any error, because time is always limited in nearly all kind of maintenance work on an infrastructure. In general I would run the SRM simulations on a regular basis.

26_srm_recover

It doesn´t matter which recovery type you have performed, as soon as possible you want to protect your infrastructure again. Let´s say your primary data center is up again the first step is to use the so called reprotect function. You find it next to the Recovery and Test function.

27_srm_reprotect

And again a warning 🙂

28_srm_reprotect

You see now that there are much less steps performed like you have seen before, when you did the failover. Basically SRM is changing the direction of the replication. Due to the fact you run your infrastructure on the secondary site (it doesn´t matter if just for some hours or maybe for weeks) you need to get a synchronous condition of your data on both arrays again.

29_srm_reprotect

After the replication is finished SRM is now ready to do a failback to the primary site. Of course you should choose the Planned Migration Option for this operation 🙂






Trackbacks

  1. […] NetApp SnapMirror SRM Tutorial Part 6: SRA installation SRM Tutorial Part 7: Configuration #1 SRM Tutorial Part 8: Configuration #2 SRM Tutorial Part 9: Advanced configuration & […]

  2. […] NetApp SnapMirror SRM Tutorial Part 6: SRA installation SRM Tutorial Part 7: Configuration #1 SRM Tutorial Part 8: Configuration #2 SRM Tutorial Part 9: Advanced configuration & troubleshooting I start again with a diagram, […]

  3. […] NetApp SnapMirror SRM Tutorial Part 6: SRA installation SRM Tutorial Part 7: Configuration #1 SRM Tutorial Part 8: Configuration #2 SRM Tutorial Part 9: Advanced configuration & […]

  4. […] NetApp SnapMirror SRM Tutorial Part 6: SRA installation SRM Tutorial Part 7: Configuration #1 SRM Tutorial Part 8: Configuration #2 SRM Tutorial Part 9: Advanced configuration & troubleshooting Note: The following steps also […]

  5. […] NetApp SnapMirror SRM Tutorial Part 6: SRA installation SRM Tutorial Part 7: Configuration #1 SRM Tutorial Part 8: Configuration #2 SRM Tutorial Part 9: Advanced configuration & […]

  6. […] NetApp SnapMirror SRM Tutorial Part 6: SRA installation SRM Tutorial Part 7: Configuration #1 SRM Tutorial Part 8: Configuration #2 SRM Tutorial Part 9: Advanced configuration & […]

Speak Your Mind

*