|
|
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9029881
By Bert Latamore
August 09, 2007
Computerworld
Pushed in part by U.S. business regulations concerning data
preservation, financial and other high-end organizations are moving to a
three data center architecture for disaster recovery, says Wikibon.org
community member and data center consultant Josh Krischer.
In this architecture, two nearby data centers are linked synchronously
with a third, located farther away, linked asynchronously. However, he
warned, some data is always lost in a disaster, even when the remote
copy is done via a synchronous link. Keeping data losses to a minimum is
critical for some applications, but a more important issue is assuring
data consistency and integrity at the recovery site. Inconsistent data
at the recovery site usually requires time-consuming recovery processes,
which may take days.
Speaking at Wikibon.org's weekly Peer Incite teleconference, which is
open to all interested parties, Wikibon.org co-founder David Floyer
related his experience consulting with one such company that was
considering implementing very high-speed continuous asynchronous data
transfer from its U.S. to its European data centers to guard against a
potential major loss. "The company had two data centers, 15 miles apart,
synchronously connected so transactional data is written to both
simultaneously," he says. "If one goes down, it can recover from the
other, theoretically with very little loss of data."
The proximity of the two centers, determined in part by the distance
over which a synchronous link can be maintained, also avoided one of the
common errors in disaster recovery planning, putting the recovery site
too far from the main data center. "Putting them far apart may make you
feel safer," says Floyer, "but it actually makes recovery harder and
more expensive and may therefore decreases the plan's effectiveness."
However, he says, this company was concerned about the possibility of a
regionwide disaster that might bring down both data centers. The
organization was sending a 2TB incremental backup to its European data
center twice daily, but in a regional disaster that could result in the
loss of up to 20 hours of transactions. It wanted to invest in an
advanced network-based system to create an asynchronous link between the
U.S. and European data centers to reduce the maximum potential loss to a
few minutes. The implementation and operational costs for this upgrade
was estimated at about $25 million over three years.
The business leads
This might seem to be an extravagant solution to the problem, and Floyer
emphasizes that this isn't the answer for everyone. "I worked with a
retailer, for instance, who decided that local backup site was
sufficient for their DR needs. If a regional disaster took out both data
centers and distribution centers, they expected their business would not
survive in any case."
IT organizations (ITO) can't make the basic decisions on disaster
recovery strategy, Floyer says. They must be based on business decisions
concerning how much data and time a company can afford to lose, how much
that loss will cost the organization and how best to mitigate that loss
(e.g., insurance, different technical solutions or accepting the problem
as a business risk). Only senior business executives, and in some cases,
the board of directors can make those decisions. So rather than going it
alone, IT needs to push the business to examine disaster recovery in
light of its financial and legal compliance situation.
"ITOs in organizations that talk about disaster recovery but fail to
develop a business-lead plan should not be seduced by the opportunity to
buy more technology or experiment with new products," added Peter
Burris, Wikibon.org's co-founder and chief content officer. "Instead,
they must act as aggressively as possible to force the business to lead
the process."
Triangulating cost
The first responsibility of the business, Krischer says, is to develop a
business impact analysis to estimate the recovery cost of data lost and
damage caused by each minute the business is interrupted in a disaster.
This is based on the amount of business that will be lost, as well as
other business damages such as reputation loss, for example. This is
clearly a business rather than an IT calculation, and it's often
difficult to develop. One of the most common errors in disaster recovery
planning is misestimating the potential cost of a business outage to the
corporation.
Business impact can be hard to assess and has multiple aspects. Instead
of relying on just one estimate -- for example, an internal computation
of the cost per minute of a business interruption times the maximum
number of minutes before systems can be restored -- businesses often
seek multiple estimates from different experts who approach the issue
from differing perspectives, Burris says.
Some companies, for example, will ask their investment banker for an
estimate of the impact of a business interruption on the organization's
capitalization. Another alternative, Burris says, was to have a company
that specializes in investigating business disasters create an estimate
of potential loss. They also can usually provide a good estimate of the
probability of the disaster occurring.
In the case of Floyer's client, the disaster planning team calculated
the average dollar value of a transaction and the average number of
transactions per minute to arrive at a basic potential loss per minute
of lost data.
They also needed to calculate the probability of a regional disaster
that would take both the local data centers down. Probabilities of
various disasters are usually based on historical information -- how
often these events have happened in the past -- and often are publicly
available.
Based on these calculations and the average amount of data that would be
lost under its existing daily backup schedule, they estimated that the
company could expect one regional disaster taking down both data centers
every decade, for a staggering loss of $2.5 billion a decade, or $250
million per year. The best they could do by improving their disaster
recovery processes would reduce this to about $1 billion, or $100
million per year.
The team then approached an insurer and found that the annual premium to
ensure against a regional disaster would be at least $100 million. The
team then looked at the annual interest the firm would lose if it
self-ensured by posting a reserve as required by the International
Convergence of Capital Measurement and Capital Standards (Basel II). The
annual lost of income from the reserve was well over $100 million.
Given these alternatives, the three data center solution was the obvious
choice. The payback period was seven months, with a net present value
over three years of over $150 million.
Invitations to the table
Burris and Floyer suggested that at least four and possibly seven groups
need to be represented at disaster recovery planning sessions:
1. CXO-level corporate management and possibly corporate directors
who must make the final strategic and financial decisions.
2. The head of the line(s)-of-business the disaster recovery solution
will serve.
3. Facilities or operations management, which must provide an
assessment of relevant external factors such as the proximity of
earthquake fault lines, chemical or nuclear power plants and so
on, to the data center that increases the risk probability.
4. IT, which must quantify the potential risks and present the
technical disaster recovery options for mitigating that risk.
5. Corporate auditors to ensure that auditing procedures are included
in the recovery plan.
6. The corporate compliance officer or legal counsel to discuss
regulatory and other potential legal exposure, depending on the
nature of the organization's business.
7. Outside consulting to aid the planning process and ensure that
nothing important is missed, important if the organization lacks
depth of internal experience in disaster recovery planning.
Keep it simple
"In a disaster, nothing will work as planned," says Krischer. "So you
have to improvise." To allow that, companies need to keep their plans as
simple and flexible as possible. One of his clients focused much of its
planning effort on ensuring that key business executives would be
reachable in emergencies to make the business decisions on what to do.
Discussion focused on what was adequate emergency communications and
whether, for example, the disaster recovery budget should include
satellite phones for those executives, and whether they would keep those
phones charged and constantly with them if it did.
Also, he says, "Users will accept lower service levels in a disaster,"
so IT doesn't have to recover all systems immediately to normal service
levels.
Practice, practice, practice
Floyer's IT client had a second item on its agenda. IT was testing its
disaster recovery plan twice a year, but the CIO had less than complete
faith that it would work in a real event. "They were testing an ideal
scenario with historical data, and when real disasters happen, a lot of
other things go wrong," Floyer says. "The overall testing strategy is
one of the most important things that you have to get right." The
literature is replete with stories of disaster plan failures. "They
wanted to move operations from one center to another regularly, to make
what is essentially a disaster recovery from center A to B or C part of
the normal way activities were scheduled." That required an expenditure
of time and money but is the best way to reduce the risk that they would
suffer major complications in a real disaster.
Budget and time
Finally, Burris says, "Business management must commit to supporting the
plan, not just talking about it. The level of that commitment is
expressed in how close the level of funding they authorize approaches
the ideal funding level and in their willingness to commit their own
time to planning, testing and other activities that will prepare the
organization for the eventual disaster."
Without that level of commitment, he says, IT can't hope to develop an
adequate disaster response.
-=-
Bert Latamore is a journalist with 10 years' experience in daily
newspapers and 25 in the computer industry. He has written for several
computer industry and consumer publications. He lives in Linden, Va.,
with his wife, two parrots and a cat.
____________________________________
Visit the InfoSec News book store!
http://www.shopinfosecnews.org