In 2017, Forrester Research partnered with the Disaster Recovery Journal to look at the state of disaster recovery preparedness in today’s companies. The results were startling. Just over half (55%) felt they were either prepared or fully prepared.
This puts 45% of today’s businesses in a precarious position. According to the most recent data available from the Ponemon Institute, the average cost of data center downtime rose from $505,502 in 2010 to $740,357 in 2016.
Furthermore, for most businesses, it’s not a matter of “if” they’ll experience a data center outage, it’s a matter of “when” and for “how long.” Almost a third of all businesses surveyed experienced downtime or a severe degradation in service over the past year.
Designing an IBM i disaster recovery plan to meet your recovery objectives
There is no one-size-fits-all disaster recovery plan. For all businesses, regardless of platforms used in their primary location(s), the first step is to set their recovery objectives.
Recovery Time Objective (RTO)– The amount of time, measured in hours or days, it takes to recover data and get the application back up and running.
Recovery Point Objective (RPO)– How much data loss a company is willing to take, essentially how often data gets backed up and how much data can potentially be lost in an event of a disaster.
Not only do these need to be set based on your organizational needs, but they may also be different for different workloads. For example, a distributor may set a very low RTO and RPO for their order entry systems or their prospect-facing website, but decide that they can handle longer downtime and greater data loss in their marketing automation systems.
Setting RTO and RPO appropriately is crucial as they are key factors when deciding which replication solutions will best suit the business, and in turn, how much will need to be budgeted for disaster recovery. Generally, the lower the RTO and the RPO, the more expensive the solution.
With objectives set, you can now focus on choosing the best recovery options for you. Here are four options to consider with some guidance on how each of these options impacts RTO, RPO, and cost:
MIMIXÒ Replication: Constant Replication (Quickest Recovery Time) – For mission-critical applications with an RTO and an RPO of only a couple of hours, we recommend using a highly available (HA) replication process. At Connectria we primarily use MIMIX replication software although there are other alternatives, such as iTeraÒ, that we use when a customer already owns the solution. Because MIMIX allows for constant replication to the failover site, it provides the quickest recovery times. That being said, it is the costliest option of the four we’ll discuss here.
SAN-to-SAN Replication (Quick to Moderate Recovery Time) – SAN-to-SAN replication offers a solution with similar RTO and RPO as logical replication, without the additional cost of logical replication software. In addition, the disaster recovery LPAR (logical partition) does not need to be “live” to receive the replicated data.
It should be noted that this solution requires both the production and the disaster recovery IBM i server to be operating on the same type of external storage array. In addition, the process utilized to recover dirty blocks is inherent in the OS/DB2 recovery routines that are part of the iOS. This means that any blocks that need to be recovered or backed out are cleaned up via the DB2 recovery techniques/routines. This results in an IPL (initial program load) that is slightly longer than a standard IPL but certainly within scope of most disaster recovery plans.
Vault Archive: Periodic Daily Replication (Variable Recovery Time and Data Loss) – Another option is a vault archive. In this scenario, the production iSeries would be located in your data center with periodic daily changes sent to a vault in Connectria’s data center. If you were to experience a disaster, we would restore that vault to an iSeries in our data center.
While this is the least expensive option, you need to be aware that the RTO and RPO of this approach are not much different than that of a straight back-up solution. You will only have data from the point of your last replication to the vault, so you could lose as much as twenty-four hours’ worth. RTO is heavily dependent on what data is in the vault and the difference from the time it left the production site and when it was received at our vault.
RTOs and RPOs for vault archive don’t fall within the range of what we would consider “disaster recovery,” so we do not recommend this approach for mission-critical workloads. However, if you have a workload that you simply need to archive on a regular basis, this solution works well. The other advantage over an in-house backup protocol is that we offer this as a service, so it’s one more task you can take off your plate.
Backup Tape: Local Replication (Most Cost Effective; Highest Recovery Times and Data Loss) – Local replication is a strategy that has been around for years, only the medium has changed – and sometimes, not even that. This major advantage of this approach is that it is cost effective. You back up your important workloads to disk, tape, or some other physical medium, typically at the end of the workday.
However cost-effective this approach is, the disadvantages are numerous. The recovery time in this scenario typically depends on the amount of data you’re backing up and how long it would take to restore. If you’re storing your backup offsite (and you should), remember to factor in the time it will take to retrieve the backup from the storage facility as well.
Like the vault archive approach, recovery point is defined by the time the last backup was made, and therein lies one of the greatest challenges with this solution. At best, you’re probably going to create a backup once every twenty-four hours, leading to as much as a day’s worth of data loss. Many organizations make backups less frequently, especially during their busy times.
Physical media like tape drives degrade over time as well, so unless you test your backups for viability (and most organizations don’t) you can never be completely confident that your data will be recoverable.
Benefits of IBM i Disaster Recovery as a Service
Each of these options can be deployed using in-house resources, but the cost is often greater than using an outside resource. You’d have to either rent data center space somewhere or potentially even build a redundant disaster recovery site. Maintaining your own data center, even if it’s just a backup site, comes with the high overhead costs of equipment and facility maintenance. Plus, for many, the current shortage of skilled IT professionals presents a challenge when hiring and retaining the necessary staff.
Furthermore, while disaster recovery is seen as a priority, IT resources are stretched thin in many organizations, and things don’t always get done according to plan. As we noted earlier, backups may not be made according to schedule, and a majority of organizations fail to test their failover processes and systems to ensure that they will function when needed.
Disaster Recovery as a Service (DRaaS) – The replication and hosting of physical or virtual servers to the cloud (including the data and applications running on them) by a commercial hosting provider to enable failover of your primary systems – within a defined time period – in the event of man-made or natural disaster.
In a recent Forrester study, 40% of enterprise respondents said they use DRaaS, and a further 24% indicated that they have plans to adopt it in the next 12 months. With DRaaS, these organizations can avoid the high cost of maintaining and staffing a data center themselves and ensure that their carefully laid plans will work when they need them.
If your organization is thinking about different disaster recovery options, reach out to us. We’d be happy to discuss your options and how you can most cost-effectively meet your RTO and RPO objectives.
 Ponemon Institute, Cost of Data Center Outages, January 2016
 Forrester Data Global Business Technographics Infrastructure Survey, 2016.
 Forrester Data Global Business Technographics Infrastructure Survey, 2016.