What are some best practices for disaster recovery approaches?

Security, Governance, Risk, and Compliance - Strictly DR or DR, BCP? Understand, clearly document what the business RTO/RPO are before you start.  Would also recommend a BIA and Risk assessment. 

6 comments

https://www.pulse.qa

Pulse User

Strictly DR or DR, BCP? Understand, clearly document what the business RTO/RPO are before you start.  Would also recommend a BIA and Risk assessment. 

Pulse User

Continuous testing restoring to a staging environment to ensure recoverability. The last thing you want to do is find out there is a flaw when you need it restored to production. 

Pulse User

First you need to do a BIA and make sure you understand which systems are critical and which ones can wait. They you build out a plan for what do you do in case of a failure. This includes backup's, how to rebuild environment, testing, network, etc. Lastly, you need to do a paper walkthrough like you have had a disaster and validate that you can rebuild the environment(s). My teams use to do a paper walkthrough minimum each year and I asked for a real one as well. We rebuilt it in a dev/test environment to ensure backups were accurate etc. You should have a formal DR document that describes the process, what systems come up first etc. I can say if you do these things and do the testing if will pay off if you have a disaster. 

Pulse User

The only certain way to know if your DR environment can function seamlessly, is to run the DR site as the production site once or twice a year.  One company I worked for would shut down their primary site and run the market from the backup site, twice a year for two weeks. I know most companies cannot afford this cost, but it is one way to get almost absolute assurance that your DR site will run and function as intended when you need it. This also depends on the criticality of your business, the impact to your customers and your reputation in the market place of being reliable.

Pulse User

Absolutely need to have full tests.  Agree with other posts on BIA's etc, but once that is done, table top exercises help you be successful, but unless you start a recovery from scratch, you won't uncover what you are truly missing.   And don't use your top talent on the test.  Top engineers can figure out the problems and make the leaps between gaps in the documentation.  

Pulse User

Backups, redundency, replication, contingency plan, all off-site