Five major data center outages reported last week

The website Data Center Knowledge recently published an alarming report about five major data center outages that occurred in the past week. Here is a brief breakdown from the article.

  • “On Monday June 29, Rackspace Hosting (RAX) experienced a power outage at its Dallas data center that left several areas of the facility without power for about 45 minutes, knocking many popular customer web sites offline.
  • “Early Thursday Equinix Inc. (EQIX) data centers in Sydney, Australia and Paris each experienced power failures. While the power outages were brief – Equinix said the Sydney event lasted 12 minutes while power was restored in Paris in just one minute – many key customer sites took considerably longer to recover their systems. The Sydney event led to disruptions for VoIP service in parts of Australia, while the Paris outage caused downtime for the popular video site DailyMotion and the French portal for hosting firm ClaraNet.
  • Google App Engine, the company’s cloud computing platform, had lengthy performance problems on Thursday, experiencing high latency and data loss.
  • “A fire at Fisher Plaza in Seattle late Thursday night left many of the building’s data centers without power. The fire in an basement-level electrical room triggered sprinklers and caused extensive damage to generators and electrical equipment. The damage left tenants with backup plans offline for hours, and those without backup sites down until temporary generators restored power early Saturday morning. The biggest impact was at payment gateway Authorize.net, which was offline for more than 12 hours, leaving its merchant customers unable to process credit card sales. Other sites experiencing lengthy downtime included AdHost, GeoCaching and Microsoft’s Bing Travel.
  • “Early Sunday, July 5, a fire at 151 Front Street, the major carrier hotel in Toronto, knocked out power on several floors of the facility used by Peer 1 networks. Power was restored in about 3 hours, after a damaged UPS unit was bypassed.”

The author, Rich Miller then goes on to point out some tough questions and the lessons learned from these outages.

Although it is surprising that data centers of this size can experience an outage like this, what is even more surprising is that it all happened in a week’s span. I wonder, is the National Security Agency going to look into this?

The equipment used to monitor this size data center is monumental, but even the smallest IT department can obtain economically priced sensor equipment, like the Bitsight8, combined with Intelligent Sensors, like the AC Voltage Detector and the Digital Voltometer.

My Ravica sensorProbe woke me up! Time for some coffee.

April 1, 2009 by JimmyD · Comment
Filed under: Data Center, Intelligent Sensors, SensorProbes 

What a morning here at our Network operations center. My cell phone paged me at 2:00 am letting me know that server room 4 was overheating. After I grumbled a few choice words, I got out of bed to see what the issue might be. I also received another page from the air flow probe .

I logged into Denika and then clicked on the SvrRoom4 report group. I pat myself on the back for being super smart. When we set up this server room I made sure to setup reports for the various Ravica probes and complimented them with other related SNMP reports. I have quite a few, port utilization, memory, CPU utilization and most importantly System Temperature.

So I looked at the reports. I drilled down in the historical graph and could see that the air flow sensor saw a steady decline a little after 1:30 am. I then went over to the temp sensor and started to see the temp climb around 1:45 am. The temperature sensor reached the threshold at 1:55 am.

At this point I was a bit puzzled. We had placed the air flow sensor by the cooling unit but the AC voltage detector was reporting fine. That means the environmental fan was running.

I’m lucky, I was the designer of this server room and was adamant about having a security light that I could turn off or on remotely. So I sent the command to turn on the light and then logged into the webcam. The good news is that I could see what happened. We had stacked some cardboard boxes on that wall and one had fallen in front of the vent. That means that fan was running but air couldn’t get out.

The good news is that I was able to find and remedy the problem quickly. The bad news is that I had to get dressed and drive over to the office and move the boxes. I did make sure to stack all the boxes on top of the desk of the person who was supposed to get rid of them in the first place!

____________________________________
Jim Dougherty aka “Jimmy D”
Lead PreSales Support Engineer and
Netflow Evangelist for Plixer International!

Follow me on Twitter
http://twitter.com/jimmydnet

____________________________________

Why is that iPhone on my network?

March 30, 2009 by JimmyD · Comment
Filed under: General 

Using Ravica environmental monitoring products can help protect your network from the physical world but what about the new mobile world? Can your protect your network from the Smartphone cloud?  Smartphones are all around us.  The advent of the iPhone had brought their use to the forefront of the IT department.  As a result there use has burdened the corporate network and become a big security risk.  You can’t ignore the growth, recent surveys show that smartphone use is rising and should grow by 25% in the next three years.

So what do you do?

The influx of smartphones also creates a host of challenges for any IT pro seeking to manage that rapidly growing portion of the enterprise. Armed with the right information and tools, you can make sure that the true potential of a highly mobile workforce is realized.

Dawn the correct armor.

A smartphone can operate inside and outside of your firewall, similar to a laptop. Since you are using smaller operating system, and in some cases unique, your job becomes a little more difficult. So that means making  sure your smartphone connections are secure is priority number one. I found a great article that explains how to secure your smartphones and the data that they access.  Here is a similar white paper from ZDNET.

Manage your army:

So how do we manage smartphones when they are on and off of your network? Matt Bancroft from Smartphone Security Magazine tells us that, “like the laptops of remote workers, smartphones need to be catered to as a part of the network and subject to corporate management and security measures. It is essential that companies have a corporate IT management policy in place that takes these smart mobile devices into account.”

Three things IT departments must consider when smartphones are running enterprise applications are:

  • Operational Continuity: Once employees are trained and start to rely on the applications on their phones, you need to make sure that they are running all the time. This means controlling the phone’s firmware and the other applications that run on it to ensure that it has 100% up time.
  • Reducing Support Costs: You need to be able to take control of phones remotely or push files when needed. This can be extremely helpful.
  • Security and Compliance: This includes backups to ensure data can’t be lost, and encryption or remote device wiping to protect data when a device is stolen or misplaced. It may also include communications controls, such as archiving SMS messages or preventing them altogether.

Management tools include Sybase with iAnywhere (for Windows Mobile, BlackBerry, Palm OS and Symbian), Logmein (for Windows Mobile, Symbian and BlackBerry shortly), and Microsoft with its Mobile Device Manager 2008 module, which is part of its System Center family of management products for devices running Windows Mobile 6.1.

Here are some items that you want to look for in your management application.

  • Active Directory/Group Policy domain join
  • Mobile VPN with dual-factor authenticated access
  • Application allow and deny
  • SMS, Bluetooth and camera disablement with Active Directory Group Policy-based targeting
  • Over the air device provisioning and software deployment
  • Device inventory and reporting
  • Help desk console and role-based administration
  • Device wipe

It’s clear that smartphones are becoming a more integral part of most enterprises. Today’s technology workers are more tech-savvy than ever  The influx of smartphones also creates a host of challenges for any IT pro seeking to manage that rapidly growing portion of the enterprise. But armed with the right information and tools, you can make sure that the true potential of a highly mobile workforce is realized.

____________________________________
Jim Dougherty aka “Jimmy D”
Lead PreSales Support Engineer and
Netflow Evangelist for Plixer International!

Follow me on Twitter
____________________________________