Last updated March 30, 2020
Earlier this year, the Disaster Recovery Journal released its latest figures on disaster recovery preparedness. The last time we reported on this study was in 2017 when just over half (55%) felt they were either prepared or fully prepared. That percentage increased, with 80% now saying they were either prepared or very prepared. We’re cautiously optimistic about those percentages, but that still leaves 20% of organizations with gaps in their disaster preparedness strategy.
For those of you leveraging IBM Power Systems throughout your organization, and especially for your mission-critical applications, here’s a quick run-down on some of your disaster recovery options. We have a great team of IBM engineers on staff, so please don’t hesitate to reach out to us if you have questions.
Designing an IBM i disaster recovery plan to meet your recovery objectives
There is no one-size-fits-all disaster recovery plan. For all businesses, regardless of platforms used in their primary location(s), the first step is to set their recovery objectives.
- Recovery Time Objective (RTO)– The amount of time, measured in hours or days, it takes to recover data and get the application back up and running.
- Recovery Point Objective (RPO)– How much data loss a company is willing to take, essentially how often data gets backed up and how much data can potentially be lost in an event of a disaster.
Not only do these need to be set based on your organizational needs, but they may also be different for different workloads. For example, a distributor may set a very low RTO and RPO for their order entry systems or their prospect-facing website, but decide that they can handle longer downtime and greater data loss in their marketing automation systems.
Setting RTO and RPO appropriately is crucial as they are key factors when deciding which replication solutions will best suit the business. It’s also imperative to plan how much will need to be budgeted for disaster recovery. Generally, the lower the RTO and the RPO, the more expensive the solution.
With objectives set, you can now focus on choosing the best recovery options for you. Here are four options to consider with some guidance on how each of these options impacts RTO, RPO, and cost:
MIMIX™ Replication: Constant Replication (Quickest Recovery Time)
For mission-critical applications with an RTO and an RPO of only a couple of hours, we recommend using a highly available (HA) replication process. At Connectria we primarily use MIMIX replication software although there are other alternatives, such as iTera™, that we use when a customer already owns the solution. Because MIMIX allows for constant replication to the failover site, it provides the quickest recovery times. That being said, it is the costliest option of the four we’ll discuss here.
SAN-to-SAN Replication (Quick to Moderate Recovery Time)
SAN-to-SAN replication offers a solution with similar RTO and RPO as logical replication, without the additional cost of logical replication software. In addition, the disaster recovery LPAR (logical partition) does not need to be “live” to receive the replicated data.
It should be noted that this solution requires both the production and the disaster recovery IBM i server to be operating on the same type of external storage array. In addition, the process utilized to recover dirty blocks is inherent in the OS/DB2 recovery routines that are part of iOS. This means that any blocks that need to be recovered or backed out are cleaned up via the DB2 recovery techniques/routines. This results in an IPL (initial program load) that is slightly longer than a standard IPL but certainly within the scope of most disaster recovery plans.
Vault Archive: Periodic Daily Replication (Variable Recovery Time and Data Loss)
Another option is a vault archive. In this scenario, the production iSeries would be located in your data center with periodic daily changes sent to a vault in Connectria’s data center. If you were to experience a disaster, we would restore that vault to an iSeries in our data center.
While this is the least expensive option, you need to be aware that the RTO and RPO of this approach are not much different than that of a straight back-up solution. You will only have data from the point of your last replication to the vault, so you could lose as much as twenty-four hours’ worth. RTO is heavily dependent on what data is in the vault and the difference from the time it left the production site and when it was received at our vault.
RTOs and RPOs for vault archives don’t fall within the range of what we would consider “disaster recovery,” so we do not recommend this approach for mission-critical workloads. However, if you have a workload that you simply need to archive on a regular basis, this solution works well. The other advantage over an in-house backup protocol is that we offer this as a service, so it’s one more task you can take off your plate.
Backup Tape: Local Replication (Most Cost Effective; Highest Recovery Times and Data Loss)
Local replication is a strategy that has been around for years, only the medium has changed – and sometimes, not even that. This major advantage of this approach is that it is cost-effective. You back up your important workloads to disk, tape, or some other physical medium, typically at the end of the workday.
However cost-effective this approach is, the disadvantages are numerous. The recovery time in this scenario typically depends on the amount of data you’re backing up and how long it would take to restore. If you’re storing your backup offsite (and you should), remember to factor in the time it will take to retrieve the backup from the storage facility as well.
Like the vault archive approach, recovery point is defined by the time the last backup was made, and therein lies one of the greatest challenges with this solution. At best, you’re probably going to create a backup once every twenty-four hours, leading to as much as a day’s worth of data loss. Many organizations make backups less frequently, especially during their busy times.
Physical media like tape drives degrade over time as well, so unless you test your backups for viability (and most organizations don’t) you can never be completely confident that your data will be recoverable.
Benefits of IBM i Disaster Recovery as a Service
Each of these options can be deployed using in-house resources, but the cost is often greater than using an outside resource. You’d have to either rent data center space somewhere or potentially even build a redundant disaster recovery site. Maintaining your own data center, even just a backup site, comes with the high overhead costs of equipment and facility maintenance. Plus, for many, the current shortage of skilled IT professionals presents a challenge when hiring and retaining necessary staff.
Furthermore, while disaster recovery is seen as a priority, IT resources are stretched thin in many organizations, and things don’t always get done according to plan. As we noted earlier, backups may not be made according to schedule, and a majority of organizations fail to test their failover processes and systems to ensure that they will function when needed.
Disaster Recovery as a Service (DRaaS)
The replication and hosting of physical or virtual servers to the cloud (including the data and applications running on them) by a commercial hosting provider to enable failover of your primary systems – within a defined time period – in the event of man-made or natural disaster.
With DRaaS, organizations can avoid the high cost of maintaining and staffing a data center themselves and ensure that their carefully laid plans will work when they need them. 2020 data shows that 42% of large enterprises and 38% of mid-sized companies leverage DRaaS to help them meet their preparedness goals.
If your organization is thinking about different disaster recovery options, reach out to us. We’d be happy to discuss your options and how you can most cost-effectively meet your RTO and RPO objectives.