Foreword:
We created this document to help train IT and DR team members on the issues and concerns a second data center must meet to be independent from the primary. The driver was questions such as how far apart should our data centers be? The answer in many regulations is around 300 miles, the problem is that may not make them independent. The distance is an indicator and often the large numbers are to preclude using the same staff to cover both locations. In governmental environments having extra people and second staffs may be an option, but for most others, we just can’t get any additional resources. In the end this is a risk management issue. We may not be able to have independence in all areas and that risk needs to be known, understood and either accepted or not depending on what the company management wishes.
We could (and maybe at some point will) write a book on these topics. What follows is the introduction and the first course in this subject.
What is required for a second Data Center location?
To ensure a highly available back-up data center, we need to demonstrate a complete independence among key factors. When the location can provide this, the secondary data center should not be affected when the primary is out of operation – at least in theory that is how it works. The difficulty comes when determining whether the two components are truly independent. The following is a list of the major factors that need to be considered:
1. Power
2. Telecommunications
3. Water
4. Gas (Natural {Earth} or LP)
5. People
6. Transportation (related to the people)
7. Construction of the buildings
A final consideration that is vital for the backup center is to determine how many transactions can be lost during the switchover process without a significant disruption to operations. This is not a trivial question. If complete transactional mirroring is needed (i.e., no lost transactions between the two sites) the best, largest distance that is currently advertised is 60 miles of separation between the primary and secondary sites. If some degree of transactional data is permitted to be lost and can be restored after the failover process has occurred, the distance limitations can be removed an options for a secondary site become much greater. More options often equals less cost.
Power
Power must be independent. No single supplier of the power and no single source provided by multiple suppliers will be acceptable. If power is interrupted, a self-generation capability may be substituted for independent suppliers. For this to be acceptable, the on-site fuel storage must be sufficient to power the location for at least 3 days; of course, longer is preferred. Also, fuel shipments must be available to the secondary location from a transportation grid that would not be affected by the same disaster as the primary location. (If natural gas powered generation equipment is used, the distribution system needs to be robust enough to survive the types of natural disasters that may happen in the area {earthquake being a prime disaster that can affect natural gas delivery}.)
Telecommunications
Telecommunications paths must be independent. Using a separate carrier does not meet this requirement. The carrier must document that the physical lines involved do not occupy the same locations, nor are they present in a single central office or other switching location. An outage at any of these juncture points would bring both sites out of service.
Water
While not normally considered a major factor, the water supply (lack of or contamination of), can affect the operation of a facility. By local building codes, lack of water in a facility will usually result in a closure of that facility until it is restored. An interruption of available water may also affect the ability of a facility to cool the premises, which could also result in a shut down.
Gas
Natural gas (also called earth gas in some locations) or LP (Liquefied Propane) is often used to provide heating (and in some cases cooling) for facilities. While heating may not be needed in the equipment space, it is required for areas that will house people. The gas lines need to be independent. Different providers of natural gas often utilize the same pipeline for the acquisition of the gas; therefore, a pipeline problem may affect numerous providers. The providers need to certify who provides the gas to them.
LP gas is distributed by truck to the facility. In the case of LP gas, the distribution route needs to be independent from the routes that may be affected by the outage at the primary location. For example, if the primary data center is located in Hartford and the backup data center is located 50 miles from Hartford, but all the LP gas is provided by a single depot in Hartford, the two sites are not independent.
People
Plans are routinely established that call for people to move from the primary location to the backup location. An event that either affects the availability of the people or the movement of the people will result in the backup location not being effective. This is a sensitive issue, but must be addressed. Two sites that require the same personnel to operate them are not independent. While this does require the business area to operate at higher staffing levels, it is the only way to secure independence from a single outage.
Transportation
Transportation routes must be considered when evaluating locations. If transportation of supplies and materials will be required at the backup site, then the transportation route should not be subject to a single loss of transportation means. For instance, if air traffic is canceled, land based transportation must be available. If land based transportation is used, alternate routes must be available.
Building Construction
Building construction is technically not a factor in determining independence. It is, however, a major factor in determining the life of the facilities and their ability to withstand routine events. The two data centers should be designed to handle the more routine types of natural disasters. This implies that construction is of essentially non-flammable materials; roofs are capable of handling snow loading and high winds; water can be diverted far enough away from the site to minimize flooding potential; and back-flow prevention has been installed in drains and sewer systems. These are a few of the building construction examples to be considered. Of course, the details are determined based on the location of the building and the most likely events.
These factors are not related to the distance between the primary and secondary sites, but in minimizing the number of disasters that will cause the fail-over to be enacted. Each fail-over process provides a decreased chance of a larger disaster resulting. By ensuring we have good construction in all locations, we also minimize the common causes of failure when materials and equipment are weakened by weather related events (water into the systems and facility infrastructure for example.)
The approach we use:
There are two approaches that work for this type of analysis. In the first, we examine the risk factors for two locations, chose the area to locate, then proceed with the construction planning. The second approach looks at facilities available, locations available and then conducts the risk assessment of the locations based on an actual facility not a planned one. The client selects the approach that works for them. If new construction is an option, then we can select a site and the client can build based on the parameters in the first part of this document. If the client desires an existing building, we then narrow the search along the same parameters discussed earlier:
1. Power
2. Telecommunications
3. Gas supply
4. Transportation
5. Water Supply
6. Building Construction
7. People
This usually results in just a few candidates for facilities and the analysis is done against each of them.
1. Since we already know the primary data center location, the selection of the alternate location would be assessed using the same factors for narrowing the existing building. The discussion/interviews will need to be established with the power and telecommunications carriers first. They will establish the rings whereby independence can be achieved. Successive interviews would be done with the gas supplier and water supplier. Transportation routes can be examined using available maps.
2. The next set of interviews will need to be with IT and business representatives to establish whether the centers can stand to lose any transactional data in an outage. If not, that establishes the 60 mile limitation. If some data loss can be sustained without a significant impact, we have a larger area to consider. At this time, we could look at client locations for some type of co-location plan. If there is no issue in distance or cost, we could select Phoenix since it presents the lowest overall natural hazard risk within the US.
3. We will then establish the location based on the previous factors before examining the people issue to make sure the groups are independent.
As you have seen, there is no hard core “do this, then that” checklist approach for these tasks. The timeline can vary as well depending on all the external factors we have identified. The tasks related to a secondary data location can also include Corporate Real estate for any acquisitions, and often costs of the facilities need to be factored in when making this decision.
This process is not easy, but it is important. There are a myriad of details that do not come easily to most, but it is our area of expertise. That's why you hire us to help you remain a going concern.
Fred Bio |
Keith Bio |
Business Continuity |
Technology Risk |
Integrated Security |
Operational Effectiveness |
Memories Preservation |
Ebola Hype |
Risk Assessments for Health Care |
Cyber Threat |
Data Center Independence |