Blog December 13, 2018

Disaster Recovery Options For The IBM i Series

In 2017, Forrester Research partnered with the Disaster Recovery Journal to look at the state of disaster recovery preparedness in today’s companies. The results were startling. Just over half (55%) felt they were either prepared or fully prepared.

This puts 45% of today’s businesses in a precarious position. According to the most recent data available from the Ponemon Institute, the average cost of data center downtime rose from $505,502 in 2010 to $740,357 in 2016.[1]

Furthermore, for most businesses, it’s not a matter of “if” they’ll experience a data center outage, it’s a matter of “when” and for “how long.” Almost a third of all businesses surveyed experienced downtime or a severe degradation in service over the past year.[2]

Designing an IBM i disaster recovery plan to meet your recovery objectives

There is no one-size-fits-all disaster recovery plan. For all businesses, regardless of platforms used in their primary location(s), the first step is to set their recovery objectives.

Recovery Time Objective (RTO)– Amount of time it takes to recover data and get the application back up and running, similar to a customer’s SLA, measured in amount of hours or days.

Recovery Point Objective (RPO)– How much data loss a company is willing to take, essentially how often data gets backed up and how much data can potentially be lost in an event of a disaster.

Not only do these need to be set based on your organizational needs, but they may also be different for different workloads. For example, a distributor may set a very low RTO and RPO for their order entry systems or their prospect-facing website, but decide that they can handle longer downtime and greater data loss in their marketing automation systems.

Setting RTO and RPO appropriately is crucial as they are key factors when deciding which replication solutions will best suit the business, and in turn, how much will need to be budgeted for disaster recovery. Generally, the lower the RTO and the RPO, the more expensive the solution.

With objectives set, you can now focus on choosing the best recovery options for you. Here are four options to consider with some guidance on how each of these options impacts RTO, RPO, and cost:

MIMIXÒ Replication: Constant Replication (Quickest Recovery Time) – For mission critical applications with a RTO and a RPO of only a couple of hours, we recommend using a highly available (HA) replication process. At Connectria we primarily use MIMIX replication software although there are other alternatives, such as iTeraÒ, that we use when a customer already owns the solution. Because MIMIX allows for constant replication to the failover site, it provides the quickest recovery times. That being said, it is the costliest option of the four we’ll discuss here.

SAN-to-SAN Replication (Quick to Moderate Recovery Time) – SAN-to-SAN replication offers a solution with similar RTO and RPO as logical replication, without the additional cost of logical replication software. In addition, the DR LPAR does not need to be “live” to receive the replicated data.

It should be noted that this solution requires both the production and the disaster recovery IBM i server to be operating on the same type of external storage array. In addition, the process utilized to recover dirty blocks is inherent in the OS/DB2 recovery routines that are part of the iOS. This means that any blocks that need to be recovered or backed out are cleaned up via the DB2 recovery techniques/routines. This results in an IPL that is slightly longer than a standard IPL but certainly within scope of most DR plans.

Vault Archive: Periodic Daily Replication (Moderate Recovery Time) – Another option is a vault archive. In this scenario, the production iSeries would be located in your data center with periodic daily changes sent to a vault in Connectria’s data center. If you were to experience a disaster, we would be able to take that vault and restore it to an iSeries at our data center. With this option the RPO is usually around 24 hours while the RTO would really depend on what data would be in the vault and the difference from the time it left the production site and when it was received at our vault.

Backup Tape: Local Replication (Cost-Effective Recovery) – For organizations that can afford a longer downtime in the event of a disaster, the most cost-efficient option is doing a straight tape backup. The RTO in this scenario typically depends on the amount of data you’re backing up and how long it would take to restore. If you’re storing your backup offsite (and you should), remember to factor in the time it will take to retrieve the backup from the storage facility as well.

In this scenario, RPO is defined by the time the last backup was made, and therein lies one of the greatest challenges with this solution. At best, you’re probably going to create a backup once every twenty-four hours, leading to as much as a day’s worth of data loss. Many organizations make backups less frequently, especially during their busy times. Unfortunately, it’s also at these times when they can least afford data loss.

Benefits of IBM i Disaster Recovery as a Service

Each of these options can be deployed using in-house resources, but the cost is often greater than using an outside resource. You’d have to either rent data center space somewhere or potentially even build a redundant disaster recovery site. Maintaining your own data center, even if it’s just a backup site, comes with the high overhead costs of equipment and facility maintenance. Plus, for many, the current shortage of skilled IT professionals presents a challenge when hiring and retaining the necessary staff.

Related posts:

6 Reasons to move your IBM i infrastructure to the cloud

Solving the IT skills gap starts with diagnosing the problem

Furthermore, while disaster recovery is seen as a priority, IT resources are stretched thin in many organizations, and things don’t always get done according to plan. As we noted earlier, backups may not be made according to schedule, and a majority of organizations fail to test their failover processes and systems to ensure that they will function when needed.

Disaster Recovery as a Service (DRaaS)The replication and hosting of physical or virtual servers to the cloud (including the data and applications running on them) by a commercial hosting provider to enable failover of your primary systems – within a defined time period – in the event of man-made or natural disaster. 

In a recent Forrester study, 40% of enterprise respondents said they use DRaaS, and a further 24% indicated that they have plans to adopt it in the next 12 months.[3] With DRaaS, these organizations can avoid the high cost of maintaining and staffing a data center themselves and ensure that their carefully laid plans will work when they need them.

If your organization is thinking about different disaster recovery options, reach out to us. We’d be happy to discuss your options and how you can most cost-effectively meet your RTO and RPO objectives.

[1] Ponemon Institute, Cost of Data Center Outages, January 2016

[2] Uptime Institute Global Data Center Survey, 2018.

[3] Forrester Data Global Business Technographics Infrastructure Survey, 2016.

[1] Forrester Data Global Business Technographics Infrastructure Survey, 2016.

Related Resources

3 Real Life Lessons From the Latest Data Breaches
Every day, it seems there’s a new headline announcing another data breach. It’s easy to become desensitized and pass these articles by as just the…
11 Reasons to Outsource IT in 2019
Spiceworks recently completed a survey of more than 700 business technology providers and uncovered an interesting dilemma for IT leaders. While these buyers were positive…
Whitepaper January 14, 2019
Migrating Workloads to the Cloud