Threat Level: green Handler on Duty: Brad Duncan

SANS ISC: How McAfee turned a Disaster Exercise Into a REAL Learning Experience for Our Community Disaster Team - SANS Internet Storm Center SANS ISC InfoSec Forums


Sign Up for Free!   Forgot Password?
Log In or Sign Up for Free!
How McAfee turned a Disaster Exercise Into a REAL Learning Experience for Our Community Disaster Team

Our community has a unified disaster system.  We have several organizations, local government, county government, city government, hospitals, school district and businesses involved in Disaster Planning and Response. Because we are in the northwest corner of the state of Iowa with border neighbors in Nebraska and South Dakota we often have regional exercises.  Several times a year we have Disaster Exercises where all of our teams "play together".

Today was one of those days.  At 8AM this morning the team started to gather at the local event center to prepare for the arrival of the exercise "victims".  The victims were made up of students from local high schools and colleges and a few "adult chaperone" victims.  The scenario was to be a Bioterrorist event at a sold out concert at the local event center.  All of the players arrived and were briefed on the activities of the day.  At precisely 9AM the exercise began.  The first call went out to our 911 Center to notify them that an event was unfolding at the local event center.  Information was being relayed to the 911 operator that something was going on at the Event Center with approximately 130 victims exhibiting various breathing/respiratory symptoms. The 911 operator was going through their normal fact finding questions when about 3 minutes into the call the 911 operator indicated that her computer had just quit.  She was about to transfer the call to another dispatcher when all of the computers in the 911 center began to power down.  At this point they knew something was going on but just not sure what. 

Our on scene team at first thought that this was someone's idea of adding a little twist to the exercise.  The 911 operator assured us that it was not.  A call was made to the IT department and the
911 center soon discovered that the problem was not limited to their computers but that computers all over the system were shutting down.  The local county and city governments share the network, resources and support staff for the computer systems.  They began getting calls from city and county employees from all areas, police, fire, emergency management, financial, HR, etc.  The first thing that came to mind was that a worm/virus was wrecking havoc on the City/County network.  They began an emergency shutdown of all equipment in the network to prevent spread and additional damage from being done. 

About an hour into their investigation they discovered that the culprit for the shutdown was not a worm/virus but an update that was being pushed out for the McAfee Antivirus program.  The IT staff will have a long night tonight getting all of the machines that were damaged repaired and ready to go for the morning startup. They expect to have 80% of the machines backup by tomorrow morning and 99% back up by lunch time tomorrow.

So you may assume that the loss of the 911 Center caused the Disaster Exercise to be called.  After all, how can you have a Disaster without your 911 Operators, Right? Not us.  When the 911 Center went offline at 9:05am we had to decide if we were to continue the exercise or call it due to the loss of 911.  Our EMS Director for the County decided to continue the exercise.  He began to do dispatch and communication using our 800Mhz shared radio system.  We continued the exercise, decontaminated and transported roughly 120 people to the local hospitals. We successfully completed the exercise at 11 am. 

While we were in the Hot Wash Debriefing we received a call letting us know that it was not a worm/virus but was the McAfee update that caused the entire City/County to come to a screeching halt.  Many of the individuals in the debriefing grabbed cell phones to call back to the office with the news of what happened.  For a few it was too late, the updates had already run and their organizations too were experiencing the same problems.  For those that hadn't updated yet the updates were turned off. Others were relieved to find out that they were using the competitors AV and were not in any danger.

Thanks to McAfee we were forced to test our response to a Disaster while in the midst of a real "disaster".  The positive that came out of the exercise is the fact that we had a successful exercise while using our "backup" communication system.  It was a true test of our ability to adjust to and respond to a disaster in less than perfect circumstances.  Isn't that really what our goal was?  We all know that many "disasters" having multiple components and today we saw firsthand how true that is.

 

Deb Hale Long Lines, LLC

Deborah

272 Posts
ISC Handler
Deb, great article. I work for a large utility on the east coast and even though we had the disaster contained within 45 minutes, it will take a significant part of the night to get all the hosts back up. Our primary lesson learned is that we escalate way too quickly, bringing IT into the mix, which slows down root cause analysis to a crawl.
Anonymous

Posts
It was a good call to continue. I know that it is important to be able to cope with an emergency when your normal command and control communications are cut off. I have been involved with Amateur Radio emergency communication in Civil Defense and our job is often to re-establish C3 in such cases. It is a lesson relevant to the ISC comunity, how many organisations have a backup communications plan if their Exchange (or other e-mail system) infrastructure is down?
Anonymous

Posts
News from Boston... http://www.boston.com/news/local/rhode_island/articles/2010/04/21/mcafee_glitch_causes_trouble_in_ri_emergency_rooms/
Anonymous

Posts
An underlying conversation seems to come to light with regard to the McAfee event - Where is the balance between keeping up to date on IPS / antivirus / other security products, versus leaving some buffer before deploying?

I can think of a handful of events where IPS or email virus scanning products caused disruptions for our agency, but they are not very frequent. Maybe 1 day every 5 years is an acceptable ratio?
Anonymous

Posts
I wanted to copy the above referenced URL "News from Boston," but clicking anywhere in the comments section skips you down to the comments box. Will the handler pls check this out and fix it? It's highly annoying when people post links and you can't even copy them because the comments section is hypersensitive to a pointer.
Peyton

5 Posts Posts
Many of our clients are utilizing Application Whitelisting rather than just Anti-Virus, and more fully testing the .DAT files, and sometimes updating far less frequently. In the mean time, no new unapproved applications will run, and they have time to be sure that the new .DAT won't do more harm than good. Others are phasing out Anti-Virus altogether over time.
Anonymous

Posts
Be aware that Symantec did this to Chinese-language PCs a couple of years ago - we didn't hear as much about it in the West for obvious reasons.

Any signature-based system has the potential to do this when QC goes wrong.

Unfortunately, behavior-based detection alone requires a very agile team and monitoring system to handle.

It is too bad, though, that the logic of the old Mac a/v program Gatekeeper wasn't something that (apparently) scaled or ported well. Host-based behavioral protection, in essence, and, what, 20 years ago?
peter

17 Posts Posts
Good job! Continuing the exercise was definitely the right thing to do, you should practice as realistically as possible. You couldn't scrub a real disaster just because of a bad computer patch cycle.

I'm a little surprised that discontinuing the exercise would even be seriously considered. Personally I would favor continuing the exercise in the face of almost anything other than the occurrence of a real disaster, and even then it might be justifiable to continue - again, if you have a real disaster you can't cancel it just because another one occurs. The one compelling argument for scrubbing would be if the practice event might compromise response to the real event. If not, carry on!

I'm sure there will be many valuable lessons learned. Good work, and thanks for sharing the account of your experience!
Anonymous

Posts

Sign Up for Free or Log In to start participating in the conversation!