Business Continuity Planning, also called Operational Risk Management, Disaster Recovery Planning, or the even more colloquial CYA is, in a nutshell, a business strategy for managing Murphy's Law. If it can go wrong, it will go wrong so when it does go wrong you need to have a plan in place to make sure that the situation does not take out your project or the business.
BCP can be a set of requirements and objectives specified as part of an overall project deliverable, sometimes the successful implementation of a BCP strategy might itself be the primary project deliverable. Of course BCP is an ongoing practice and needs to change and adapt as priorities, business, risks and legal requirements change. An important component of the process is the analysis of events when they occur to understand the underlying causes, the effectiveness of the response and to amend the BCP process to improve the response to future events. From a project perspective just assume that Murphy is a one of the stakeholders.
I was particularly proud of an e-commerce portal project that I worked on, from planning through deployment the project took a little over a year from start to finish and involved everything from designing and setting up the server farm within an existing company data center, to developing the e-commerce portal and working with third party vendors from telcos to graphics designers. The e-commerce portal ran on essentially a mini data-center contained within a much larger facility. One of the steps taken to ensure that the portal would operate within a 99.9% availability requirement, was to obtain internet connectivity in the form of paired communication lines from two different telcos just to ensure doubly redundant communications. Backups were taken both online to another facility and locally via standard disk and tape backups. We had an arrangement with an offsite facility where we could declare a BCP event and generally have our environment up and running within about 4 hours, which was the stated business requirement. Who knew that within 3 months of our grand opening of the portal go-live event we would be operating out of that backup facility?
What happened? Earthquake? Tornado? While the infrastructure design had carefully made sure to bring in redundant telecommunications from two different telcos and we had even routed the telco wiring through two different connections into the building. What we didn't know was that the telcos themselves were sharing the same physical conduit under the street about 500 feet from the building. A backhoe ripped right through that conduit three weeks after we went live and we had no outside communications in our brand new facility for about two days.
A wired article that same year called the backhoe - "The real cyberthreat" having taken out well over nearly 200,000 telecom lines, just within the U.S the previous year. |
Lessons learned? Trust but verify -- vendors will not necessarily answer questions that you don't ask and while we internally specified separate communications links we never thought to ask the Teclso themselves if they had redundant (and separate) paths along their own lines into our facility.
Murphy was an optimist - something will go wrong its how you respond that makes the difference