The Planet outage - what can we all learn from it?

Published: 2008-06-01. Last Updated: 2008-06-01 21:37:06 UTC
by Swa Frantzen (Version: 1)

The planet, a popular hosting provider, had earlier this weekend a fire and explosion resulting in an outage of their H1 data center.

Reading through the announcements and the usual techno-press reports on it, a few things struck me. While the last word isn't -by far- said about this, I saw a few striking things in the light of a BCP/DRP viewpoint:

First I'd like to mention that I'm actually impressed by the frequent communication and the calmness of those messages from The Planet: http://forums.theplanet.com/index.php?showtopic=90185.
I think it's important to teach those dealing with (major) incidents to remain calm. Not just when dealing with the public or the press, but also internally. Think through your decisions, before you act, as doing things in a panic will result in making the wrong choices.
Also communicating the right way can be critical, planning ahead helps a lot.
Next I saw they were "requiring us to take down all generators as instructed by the fire department". I had seen plans for BCP/DRP derail before due to officials stepping in and doing their response to an emergency in their way and not in the way the organization itself had planned it.
I think it would be interesting for most of us to actually talk to fire departments and/or police officers on what their normal responses are and take them into account in our plans. When you build a BCP you basically try to build (and spend money) on making sure you don't loose a site. One of those things you foresee is redundant power, but if you're not going to be allowed to use it, ... perhaps your priorities would shift to doing other things with your money and to fix it on another layer ?
The reason they went down seems to have been: "electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room". While it doesn't say as much, knocking down 3 walls is violent. Now an explosion can do that, and transformers indeed can explode, but there's another thing that can knock down walls: violently expanding gases from fire suppression systems, that's why you have those automatic vents in the walls. Please note: I'm not saying I know what happened, I don't know it. But there's one thing I'd do as a precaution: I'd like make sure that my facilities processes includes some regular check to see if those vents are still OK somehow. Knocking over walls is just too much of a scary idea.
The Planet got vendors involved during the weekend itself: "As you know, we have vendors onsite at the H1 data center. With their help, we’ve created a list of equipment that will be required, and we’re already dealing with those manufacturers to find the gear. Since it’s Saturday night, we do have a few challenges".
What have you foreseen to have within hours of a fire/explosion vendors helping you to assess what equipment you need to get back on-line. Can you even reach them during a weekend ? Every bit of time you put in collecting and updating this information up-front in your BCP/DRP will pay back many times in getting back on-line.
It's good to see they made a list of priorities public.
Your plans could include such lists pre-made. It's easier to cross off items you still have than to think up the list yourself during the emergency.
There is talk form both The Planet and some of their customers about DNS and redundancy. I'm pretty sure it's not entirely the Planet's fault, customers putting all their eggs in one basket exist all too often.
Still, I find this strange: DNS in my opinion is about the most redundant system you can get. You can easily add another server anywhere in the world, there is hardly any penalty for having them not all of them in the same spot. So why would you even consider having them in the same spot ? Yet I've more than once seen such setups where all the NS records entered in a TLD are on adjacent IP addresses, and when doing a traceroute they actually route exactly the same. This isn't using DNS to what it can do for you, it'll protect from a server outage, but not much more than that, while if you had a handful DNS servers out there, you'd be next to impossible to get off the air DNS wise.
The Planet is slowly getting back in the air, so that's good.
I think it would be a good idea for a next BCP/DRP exercise to replay an existing incident and measure how you do against how they did in real life.
Lastly The Planet seems to be suffering from a /. effect on their forum. I think this is about the worst moment to get on slashdot you can imagine. But it's a likely result of the incident that those things you still have will attract more visitors than ever before.
Again something to plan for ... although -we here at the ISC had to have a few /. features before we nailed it ourselves as well-.
Basically make sure to have your emergency communication as solid as you can, as static as possible, and as lightweight on the server(s) as you can imagine. The last you want to do during an emergency is to have to survive a DDoS from curious people -like we all are ourselves-.

My best wishes to the folks at The Planet and glad to read nobody got hurt.

To the thousands of customers affected, well there's the SLA, but there's also some pretty decent work in recovering going on. And I can only hope those companies where I host servers would be able to do equally well and be as open about it as these folks have been so far.

--
Swa Frantzen -- Gorilla Security

Keywords: BCP DRP outage

1 comment(s)

Updates to VMware resolve critical security issues

Published: 2008-06-01. Last Updated: 2008-06-01 13:56:42 UTC
by Mari Nichols (Version: 1)

0 comment(s)

I don't know how many of you work with VMware, but I have to thank Ed Skoudis for turning me on to virtualization in one of his classes long ago. Since that time, I have been using it as an invaluable tool for incident handling and testing patches and vulnerabilities. So, I found it interesting to see the VMware security advisory VMSA-2008-0008 sent from fellow handler Jim Clausing. Security Focus is reporting that there are no exploits in the wild at this time. These security vulnerabilities have been addressed in the newest releases of VMware's hosted product line. The advisory affects the following products:

VMware Workstation 6.0.3 and earlier
VMware Player 2.0.3 and earlier
VMware ACE 2.0.3 and earlier
VMware Fusion 1.1.1 and earlier

Windows based VMCI arbitrary code execution vulnerability

VMware says that VMCI was introduced in VMware Workstation 6.0, VMware Player 2.0, and VMware ACE 2.0 and It is an experimental, optional feature that allows virtual machines to communicate with one another. With VMCI enabled a guest may execute arbitrary code in the context of the vmx process on the host. This is a compiler dependent vulnerability and only affects systems running on windows hosts. An attacker can exploit this issue to execute arbitrary code with SYSTEM-level privileges. Successfully exploiting this issue can completely compromise affected computers. Failed exploit attempts will result in a denial-of-service condition.

VMware Host Guest File System (HGFS) shared folders

Secondly, this feature allows users to transfer data between a guest operating system and the non-virtualized host operating system that contains it. The vulnerability is a heap buffer overflow. Exploitation of this flaw might allow an unprivileged guest process to execute code in the context of the vmx process on the host. In order to exploit this vulnerability, the VMware system must have at least 1 folder shared. One good thing about this vulnerability is that if you are using the default setting, you are not vulnerable. The vulnerability only applies if you have changed the settings to share folders. VMware Server, ESX and ESXi do not provide the shared folders feature so they are not vulnerable.

Fair Winds,
Mari Nichols

Keywords: VMware Advisory

0 comment(s)

Free Yahoo email account! Sign me up, Ok well maybe not.

Published: 2008-06-01. Last Updated: 2008-06-01 01:01:27 UTC
by Mark Hofman (Version: 1)

0 comment(s)

Hello , !
Your friend invited you to use the BETA email Service from YAHOO join YAHOO and Create your Free Email Account

Just click here to receive your FREE YAHOO EMAIL Account!

Ok so it is just a small variation on the greeting card theme (although they haven’t bothered to change the file being downloaded). The main difference is the message, and rather than using HTTP to deliver the file the link is an FTP link along these lines ftp://username:[email protected]/private/postcard.pif

Connecting to 82 .bbb.ccc.ddd:21... connected.
Logging in as username ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /private ... done.
==> PASV ... done. ==> RETR postcard.pif ...

Corporates typically block outbound FTP so most of you should be OK at work. Home users however may end up with a little surprise. The file downloaded should be reasonably well detected by most AV products. The few sites I checked already had the file pulled (or not yet placed there).

It is a fairly trivial thing. The only reason I mention it is because, like no doubt a fair number of you, I looked at it and went “mmm, interesting that Yahoo is going down the invite path, just like google” and I opened the message to have a look. So the message is reasonably effective at first glance.

From a broader perspective, there seems to be no lack of FTP servers connected to the internet that have been or are being compromised. If you run an internet facing FTP server, when was the last time you checked the logs and the users defined?

Mark H - Shearwater

Keywords: FTP Greeting Cards Malware

0 comment(s)