Zerto offers two types of failovers - Test and Live Failover. This article covers the Live Failover process which is used as both means of recovering from a Disaster Recovery (DR) scenario as well as migrating workloads into the iland Secure Cloud. It is advised that you thoroughly review this article and only perform this action if you are comfortable with Zerto. Otherwise, a mistake can result in irreversible problems - for example, after the Live Failover is committed, Zerto no longer keeps a journal for the VM. This means that recovery to a different point in time is no longer possible.
Another good source to review before triggering a Live Failover is our Zerto pre-failover checklist article which contains configuration and planning recommendations that can prevent unwanted surprises during a DR event.
In contrast, as the name suggest, the Test Failover process is used for testing and has no impact on your production. Please check our article Performing a Test Failover for more information.
Before starting a live failover, the recovery group must have completed its Initial Sync. You may also not be able to fail over a recovery group that is in the middle of a bitmap or delta sync.
If the connection between your site and iland is disconnected, or if your site is down, a live failover can be performed through the iland console, or by DR invocation request with iland support. This instruction will explain how to perform the live failover through the iland console.
1. The Failover Wizard can be found in the top right corner of the Continuity tab.
2. Once you have opened the Failover Wizard, choose the failover type "Live" and Target Type "Recovery Group(s)". Then, click the "Next" button.
3. Select the Recovery Group(s) you want to failover. By default, the latest checkpoint will be selected. If you would like to choose an earlier point in time, hover over the recovery group name, click the three dots on the right and select "Change Checkpoint". By default, the failover process will make no attempt at shutting down the production servers. You can change that behaviour by clicking "Change Shutdown Behaviour".
4. (Optional) If you have experienced an outage, your production servers are already down and no changes are required. If, however, you are performing a migration, you may select Yes, in which case a graceful shutdown will be performed, or "Force" to force the VMs to power off. You may need to use Force on a server that fails to shut down gracefully or does not have VMWare Tools installed.
It is our recommendation to shut down the VMs manually and to avoid using the "Shutdown Behavior" options. This will prevent any failovers from failing due to the operating system of a VM not responding to the shutdown.
5. Click Submit on the confirmation screen to begin the failover.
6. Once the failover is completed, you will see a task in the Task Center in the iland Console. By clicking it you are able to Commit or Rollback the failover.
Attention! Committing the failover destroys the journal which means you can’t fail over to a different point in time if you encounter a problem with the OS or application within the operating system, after the failover. Rolling back gives you the option to start the failover process again but this time reverting to a different checkpoint in the journal, as needed.
7. Once the failover has been committed, you will see that the recovery group will report a state of "Needs Configuration". At this stage the recovery groups serves as a configuration placeholder, i.e it keeps all the relevant information required to facilitate a fallback to you production environment, once you have recovered it.
Following a DR event and running your production workloads in the iland DRaaS environment, you would typically strive to get your production environment up and running as soon as possible. Once that is done, you can proceed with planning the failback process. This involves replicating your workloads back to your recovered production environment and ensuring that any changes occurring to your data whist running in the DR environment are preserved.
To start that process, you will need to bring up some key components of your on-premises environment. Those would include:
In order to perform some of these tasks, you might need to utilize a DNS server (whilst navigating to ESXi hosts via IP address will work fine, vCenter Server will expect DNS to work in your environment) and a domain controller (if you are using Active Directory accounts for vCenter and ZVM logins).
If you need to power on one of the domain controllers in the inventory of you vCenter Server, you will need to consider the following:
Some key considerations regarding the failback process:
Once you recover your production environment to the state where you can start reverse replicating your workloads, you will have to log into your local ZVM and edit the recovery groups in "Needs configuration" state.
1. Select the VPG in question, click More and select Edit from the drop-down list.
2. When reviewing the VPG settings you will notice that the VPG is now set to protect the VMs running in the iland recovery virtual datacenter.
3. Since Zerto will use the original VMDK files as targets for replication, it is important to note that the VM in your production environment (not the iland recovery virtual datacenter) are shut down. Zerto will remove them from inventory and attach their VMDKs to the VRAs running in your vCenter.
4. You will see that the destination host and datastore are from your own environment and are the location of the original (protected) VMs. Please be mindful of your available datastore capacity when selecting the journal history. If you select a long journal, you will need more free space in your environment to accommodate for the journal disks. Since you are facilitating a failback and not using Zerto for DR at this stage, selecting 4h should be sufficient for most users.
5. You will see on the storage tab that the destination for reverse replication are the original VMDK files.
6. You can specify the port groups to which you would like your VMs to be connected, after they are failed back.
7. The setting in the VPG should reflect the port groups to which the VMs were previously connected. If any changes are required simply select the VM in question and click Edit Selected, and you will be able to choose a different port group.
8. If everything on the Summary tab matches your desired settings, you can press Done and wait for Zerto to enable replication.
9. You will be able to monitor the progress of the configuration changes in the ZVM interface.
10. Once completed, you will see that the VPG is now performing a delta sync; replicating all the changes that occurred on the VMs whilst they are running in the recovery virtual datacenter.
You will also notice that the direction of the arrow indicating the direction of replication is now pointing away from the iland site. Please note the additionally provided VPGs in the screen bellow to demonstrate the difference.
11. You will notice that your vCenter client is no longer showing the VMs included in the VPG in the inventory.
Once the VPGs are in sync, you will be able to trigger a Live failover back to your production environment. You can use the iland Console to perform the failback.
1. The first thing to take note of is the actual RPO of the VPGs. Since you are failing back, in a planned manner, there is no need to account for unexpected shutdowns or any data loss. The right thing to do would be to capture the current RPO, shut down the servers running in the iland recovery virtual datacenter and wait the amount of time of the RPO, which will facilitate the replication of the clean shutdown.
2. Since you will need to reverse replicate back to the iland recovery virtual datacenter after the failback, it would be advised to make sure that your workloads in the virtual datacenter are properly shut down, i.e. not just the VM is powered down but also the vApp is stopped.
You will notice that shutting down the operating system will leave the VM in a Partially Powered Off state. You can then use the Power Off option. It is not advisable to use the Power Off option before gracefully shutting down the operating system as that might lead to some data loss due to an unexpected shutdown.
Once the VMs are powered off, please remember to also power off the vApp.
3. Once the vApp is powered off and you have waited the time reported as RPO you can trigger a Live Failover by clicking on ‘Failover Wizard’ in the Continuity tab.
4. Please select failover type as ‘Live’ and target as Recovery Group(s) and click Next.
5. Select the recovery group you would like to fail back. You can leave the checkpoint as Latest and not specify any shutdown option.
6. You can click Submit on the confirmation window.
7. In your vCenter Server client you will see the failed back VMs are managed by the Zerto plugin. Until you commit the failover, it is not advisable to make any changes to them (power off, suspend, changed configuration etc.)
8. Once you have successfully tested your failed back VMs, you must commit the failover. You can do so by selecting the check mark icon next to Failover Ended in the ZVM interface.
9. Once you click commit you will be presented with an option to set up reverse replication. This will allow you to replicate your production VMs back to the iland recovery virtual datacenter.
Tick the box next to Reverse Replication and then click Set to review the settings of the VPG.
10. Whilst reviewing the VPG settings, you will see that the replication will now be targeted at iland and you can specify the respective journal history you were using before. In our example a 4 day journal was selected.
11. The VPG will be utilizing pre-seeded data located in iland.
12. You can adjust the network and IP settings as needed.
13. If you are happy with all the settings on the Summery screen, you can click Done.
14. Once you are back to the Commit screen, you can click Commit to complete the failback and allow Zerto to set up reverse replication back to iland. This will automatically un-register the VMs in the iland virtual datacenter and use their disks as pre-seeded data.
You can monitor progress of the task in the Zerto interface. Once completed, you will see a delta sync and the direction of replication will be pointing at the iland recovery site.