
The following is the forth excerpt from my Norwich MSIA Seminar 4 final paper, in which I speculate on how IA will evolve.
This chapter focuses on Disaster Recovery and Business Continuity planning.
I would love to hear what you think, so leave a comment and let me know how you think things will turn out.
Risk Analysis and Risk Management
Up to this point, this report has focused on building customer trust by assuring the systems are designed, configured and programmed securely. Companies are also judged on how they handle disasters and other service interruptions. Customers are getting comfortable interacting with companies at their convenience and are expecting them to be available. Frankly, businesses do not want to give their customers an excuse to check out the competition. This is where risk analysis and risk management come into play.
Risk analysis is difficult to do well because it is highly subjective. The method that is probably most common today is the Annualized Loss Expectancy (ALE) method. ALE attempts to estimate the cost of an event, how often it is likely to occur and compute the annual cost. For example, if an event is expected to cost $1 million and is expected to occur once every 10 years, the ALE is $100,000. The thought being that a company should not spend more than $100,000 per year to mitigate this type of event. There are several problems with this method, mainly because there is not enough data available to make an accurate estimate of frequency of occurrence.
A better method is needed. There are at least three other possible methods from which to choose:
• The Gartner Group model focuses on the human threat: mean, motive, opportunity
• Formal Analysis of Risk in Enterprise Systems (FARES) focuses on threats
• Moira Generalized Cost Containment (GCC) model focuses on the cost impact of an event
Each of these models has advantages and disadvantages:
• The Gartner Group model is easy to explain and produces an actionable threat matrix, but is highly subjective and does not address environmental threats such as fire, accidents, power loss, and similar threats.
• FARES is capable of being fairly accurate over time, it is comprehensive and assumptions can be tested using simulated, but it is fairly complicated and expensive to model.
• GCC is easy to model and explain, and it is easy to build and costs very little, but it still relies on subjective data.
The GCC model may be the best method for most businesses. The GCC creates a cost estimate for each outage type as a function of time. The cost is applied only when the maximum downtime for that type is exceeded. First, the data is graphed as loss over time assuming no recovery plan. This is the red section in the graph below. Another dataset can be added to that graph showing loss over time with a recovery plan in place (yellow section) (Miora “Using the Generalized Cost Containment”). It is easy to demonstrate if the recovery plan meets ROI or not, which may be the most important outcome of the analysis.

The Future of Risk Analysis and Risk Management
Risk analysis and management are critical components of any IA program. Both strategic and tactical decisions are based on the results of the risk analysis, and it can be extremely difficult to gain funding without demonstrating ROI. Unfortunately, there is no easy, low cost way to accurately develop this analysis at this time. Given the critical importance of this function, a better method will be developed, if for no other reason than more data will be available.
The responsibility for risk analysis will probably sit with a centralized authority in large organizations, and will probably be outsourced in smaller organizations. The reasons for this prediction are that risk management is a strategic function and it requires specialized skill and experience to do well.
Disaster Recovery and Business Continuity Planning
The risk analysis should produce a list of threats and their potential costs. IA practitioners will prioritize these threats and create a plan to mitigate and recover from them. At this point in the process the focus is on the consequences not the cause. For example, it does not make much difference to the plan if a resource is lost due to a flood or to a fire, it is still lost and the recovery plan is the same. Many of these threats can be expected to result in the loss of hardware, data and possibly entire facilities and people. The diagram below shows how DR, BC and Incident Response Planning relate to each other and to information resources (Miora “Incident Management and Response” 1).

The DR focus is on recovering from the incident or disaster, and a disaster does not have to be a malicious act on the part of man or Mother Nature. “No longer do we look at incidents as earthquakes or tornados, hackers or corporate espionage, terrorism or sabotage. Today, an incident can be any one or more of these, or can be something as simple as an accounting error that requires rebuilding and reestablishing financial baselines. It can be something as important as a breach of privacy that reveals private information about corporate customers. Any incident can cause corporate harm; every incident is less harmful if you see it coming” (Miora “Incident Management and Response” 2).
As illustrated in the above diagram, DR concentrates on restoring the data center, LAN, PCs and other infrastructure. These systems can be restored either by direct replacement or by using hot, warm, or cold sites. The decision as to which strategy to use is dependent on how long the business can afford to be down versus the cost of the recovery strategy.
Questions that might need to be answered as part of DRP include:
• How long can the business tolerate the loss of the affected information systems
• Which recovery strategy provides the best cost / performance ratio for our business
• How much computing power do we need
• How much storage capacity do we need
• How much power do we need
• How much air conditioning capacity do we need
• How much bandwidth do we need
• Which customers are affected by this incident and how will we notify them
• What communications resources will we need
• Is our network documented and where are the files stored
The answers to these questions are necessary to produce the DRP. The DRP will consist of specific actions that the DR team will follow in response to an incident. The diagram below is an example of what a detailed DRP might look like (Miora “Chapter 43” 15). The first step is to evaluate the situation and decide if the situation should be declared a disaster, continue normal operations or to disrupt normal operations for a short time. If it is declared a disaster, the recovery team determines what type of disaster it is, declares it a disaster and notifies the DR team. The final step in this diagram is to “manage legal and related concerns.” Incidents need to be documented for several reasons including legal and insurance requirements, and for post event analysis purposes.

Once the disaster is declared, and the DR team activated, the team follows the detailed plan for that type of incident. The specifics depend on the answers to our earlier questions. Eventually the facilities and hardware infrastructure will be restored, and the continuity plan can begin. A final restoration, rebuild, relocation phase may be required, depending on the severity of the incident.
BCP
BCP is about planning for restoring business operations after an incident. The location of operations may or may not be in the normal location, depending on the severity of the incident.
BCP development is one of seven steps recommended in the NIST publication, Contingency Planning Guide for Information Technology Systems (14):
1. Develop the contingency planning policy statement
2. Conduct the business impact analysis (BIA)
3. Identify preventive controls
4. Develop recovery strategies
5. Develop an IT contingency plan
6. Plan testing, training, and exercises
7. Plan maintenance

Part of step 3, Identify Preventative Controls, is to identify recovery strategies. Recovery strategies can overlap with the DRP, especially where hardware and facilities are concerned. However, typically BC recovery strategies center on backup files and backup methods. Other strategies include load balancing and mirroring of servers and databases, especially in high availability environments. NIST suggests, “The selected recovery strategy should address the potential impacts identified in the BIA and should be integrated into the system architecture during the design and implementation phases of the system life cycle. The strategy should include a combination of methods that complement one another to provide recovery capability over the full spectrum of incidents” (19).
Policies should define how often backups should be created, where they should be stored, how they should be encrypted, and how they should be transported. Backups should be stored offsite; far enough from the primary site so as not to suffer from the same disaster as could affect the primary site. Options for storing backup files offsite include:
• Network Access Storage (NAS)
• Commercial storage providers
• Tapes, removable hard drives, DVDs or other portable media
• Cloud computing providers
It is extremely important to remember that these backups contain critical data including PII and should be protected. There have been too many instances where backup media is lost or stolen, unnecessarily exposing the enterprise to risks.
Step 4 is the development of the Contingency Plan. The goal is that the plan be clear enough that frontline employees can follow it. For example, people in the Systems Operation Center (SOC) should be able to pick up the plan and follow it until an incident commander relieves them. The NIST plan consists of 5 components (31):
1. Supporting information: project charter documentation
2. Notifications / Activation Phase: documents to define notification procedures
3. Recovery Phase: recovery priority, a recovery timeline, and recovery procedures; preferably including step-by-step checklists.
4. Reconstitution Phase: operations are returned to normal in the reconstitution phase.
5. Plan Appendices: vendor contact information including support contracts, hardware and network documentation, BIA, and other related documents.
Data Retention Responsibilities
Related responsibilities in this area include data retention requirements as dictated by law. The organization must have a data retention policy and they must adhere to that policy or face fines and penalties. This is specific to eDiscovery laws. Other laws may apply depending on which industry the business is in. Here are some common examples (Herold 2,3):
Sarbanes-Oxley Act of 2002:
• Fines and imprisonment of up to 20 years are proscribed for any person who corruptly alters, destroys, or conceals any records or documents to impair the use of them in any investigation.
• Failure to maintain audit/review work papers for at least 5 years can result in fines or imprisonment for up to 5 years.
• All audit and review information must be retained in a readily accessible and indelible format for 7 years.
Health Insurance Portability and Accountability Act (HIPAA):
• Covered entities (CEs) must not only ensure the security and appropriate access to health information while in transit through networks but also while the information is in storage.
• Such information must be maintained for 6 years from the date of its creation or 6 years from the date for which it was last in effect, whichever is later.
• Penalties include not only civil, but also potentially large fines and/or prison time.
Gramm–Leach–Bliley Act (GLBA):
• Financial organizations with customers and consumers who are United States citizens must implement security programs governing the security and retention of non-public personal information (NPPI).
The Future of DRP and BCP
Cloud computing will play a big role in DR and BC, because it is going to play a major role in normal IT operations. Businesses just want IT to work, and they want to focus on their core business, and cloud computing offers them that opportunity. It is a very flexible way for organizations to add or remove capacity as needed and not have to spend capital to buy the equipment nor do they need to pay staff to operate it.
Here is what one IT professional has to say on the subject, “So now commercial IT is loaded to the gills with stuff designed originally at an entirely different time when there were entirely different issues of scarcity, and that will change.
Because of the advancements of technology, CPU power, capacity, bandwidth (the things that our entire $100B+ annual spend is based on) – all things once scarce – are now abundant in IT. … I HATE being in the IT business (and yes, I see the irony). We run VMware. We run Backup (CommVault). We run iSCSI and NAS (Dell and NetApp). We run HP dual-socket Quad-core Intel Xeon processors. We do all the same stuff everyone else does – just on a smaller scale.
I have zero desire, no offense, to have to pay people to keep this stuff working. It adds no value to my business. I am forced to be in the IT business. I would much rather spend the money focusing on adding value versus sucking value. I will be 100% in the cloud – as soon as it’s realistic for me to be. I will focus on the real scarcity issues of TIME and MONEY. I will let others run infrastructure, as it is not core to my existence. I will focus on Op-Ex, and ultimately eliminate the Cap-Ex considerations altogether” (Duplessie).
Mr. Duplessie wants to let someone else run the IT infrastructure and he wants to lease capacity from them as he needs it. He believes that will free up his people to deliver better value to his organization. The responsibility for DRP and BCP at least as it pertains to data centers and servers will be transferred to the provider, freeing up even more time for his staff to work on other things.
Next time: The final chapter: Computer Incident Response & Forensics
Bibliography
Miora, Michael. 2002. Using the Generalized Cost Containment (GCC).
Miora, Michael. 2006. Incident Management and Response.
Miora, Michael. Chapter 43.
Swanson et al. 2002. Contingency Planning Guide for Information Technology Systems.
Herold, Rebecca. Data Retention Compliance.
Duplessi, Steve. “Steve’s IT Rants”. 8/22/2009 <a href=”http://esgblogs.typepad.com/steves_it_rants/2009/08/scarcity-imbalances-why-the-smb-and-the-cloud-will-change-the-game-.html“>





