Incident responders may not always keep the business continuity planning (BCP) or management (BCM) team on their speed dial but I can tell you it’s worthwhile to do so in consideration of Critical Control 19: Data Recovery Capability.
Successful data recovery is as much a part of reliability as it is security, so embrace the process as paramount to successful response. Whether it is a significant outage from operational data loss (the SQL server ate that data) or that moment that leaves as all shuddering and queasy (attackers have tweaked our data and it is no longer reliable) you have to know you can recover.
This control does mention testing restorations from backups twice, once in the measurements section and once in the procedures and tools section, but I humbly submit that every possible measurement and procedure should be tested quarterly at a minimum.
Much as one might with incident response, drilling the recovery/restoration process is critical. And not tabletop exercises; I mean real data to real systems in real scenarios that mimic your production environment. Clearly testing the process directly in production may be difficult but a staging (or dev/test) environment is ideal for this testing.
Unfortunately for them, you need someone expert in the restoration/recovery process on-call as part of your incident response planning.
Here’s a scenario to chew on. Imagine responding to a reported incident where critical system configurations have gone missing (operational snafu, not malicious). The next day, you respond to another incident where a particular configuration has put an environment at risk and the extent of exposure needs to be identified. As a result, you ask for the offline configuration only to learn that it went missing in the incident from the day before, and that restoration was not immediately possible due another unrelated systemic shortcoming. Aargh!
How to avoid this? Short answer: test, drill, validate. Regularly. More than regularly on critical systems.
Another ugly problem that comes out of incident response but is directly affected by or is subject to data recovery practices is the "when did we get pwned?" scenario. This is where backup design is so important.
As the control mentions, you have to factor for operating system, application software, and data recovery. Yet each of these three is influenced by full, differential, and incremental methodology, depending on need, scheduling, and planning as well as the retention period.
Can you conduct a successful, relatively painless recovery today if you found out you were compromised two weeks ago and all data since is suspect? If no, keep working towards that goal. There is a light at the end of that tunnel, and it may not be a train. ;-)
Been through this? Succeeded? Failed? Let us know via the comment form.
Russ McRee - russ at holisticinfosec dot org - http://holisticinfosec.blogspot.com - Twitter: @holisticinfosec
Oct 28th 2011
7 years ago