The not so funny side of Network Management

Something odd happened today.

I was in a planning meeting with my manageheadphonesr and my AT&T Tilt started to vibrate. I find this very annoying. Of all things to happen during this super exciting meeting, this had to take the cake. Yes, I am being sarcastic and a bit over dramatic. The issue is still the same, I hate being annoyed.

The real point to this rant is to point out the subject of the alert. It was a SMS page from my Ravica tempature probe. It had been violated.

This issue raised multiple questions. The first, and most  important, would be how to politely excuse myself from the meeting. In general, this wouldn’t be a hard thing to do, but my manager was excited about the new data room expansion project. By excited, I mean elated, and by elated, I mean that he expected everyone to have the same level of passion or face the most harsh punishment executed on mere mortals.

Knowing that I could lose my admin privileges, I forged on. I told Jon that something is happening in the sever room and had to leave. He said “ok” and went on with his conversation.

Puzzled, I quickly went into the server room and found the issue. Brandon, our new, green intern had placed his super hot cup of coffee next to the temp sensor. He was in the process of cleaning up cables and listening to the Ramones on his headset.

The heat from his coffee cup quickly raised the temp around the sensor.

From this I have learned two things. The first is that you should never leave your interns unattended. They can cause way too much damage. The second is to not locate your temperature sensor where someone can obstruct it in any way.

Now I have to explain this to my boss. Wish me luck.

- JimmyD

5 tips to protect your data center hardware

Sever MessI work with network administrators everyday and I hear one common story. They are not buying servers. The IDC’s Worldwide Quarterly Server Tracker supports me on this. Server vendors are reporting that their business is off 24.5 percent from last year, falling to $9.9 billion in the first quarter of 2009.

They don’t have the budget or can’t get the budget to update their aging equipment. So, what can these admins do to extend the life of their equipment?

Just like an older automobile, maintenance and proper care of the equipment can easily extend your equipment’s life. Constant monitoring of your system inside and out can save you from loosing mission critical servers. Environmental monitors like the temperature and humidity, airflow, smoke and water sensors can be used to make sure that the environment that your systems operate in are the best they can be.

Now, I know what you are saying, “Jim’s just a salesman and wants you to buy something.” Although I might get excited about a product and preach its goodness, I am not a sales person . I might dream of being the star of a Shamtastic infomercial, but trust me, I’m not. What I do want you to know, is that there are some simple things that you can to to help save your hardware.

  • Your server room is not a storage area! If it is, it really shouldn’t be. Extra stuff in the room can cause heat issues and possibly be a fire hazard. Keep the area clean and free of obstacles.
  • Manage your cables properly. I had a boss who had a hang up on making sure the all cables were organized correctly. You guessed it, I didn’t think it was all that important. In hindsight, I was wrong. Keeping them organized is great for physical management, but more importantly, it makes it easier to manage airflow. Be it a cable tunnel or just pulling them together with a zip tie, making sure your servers get the correct ventilation is important.
  • Have a physical maintenance window for your machines. It might bring you back to youthful A+ days, but making sure the servers are dust free  and cables are in the correct place can help immensely.
  • Monitor the servers health with an SNMP Trending app. Most servers can give you CPU, Server Temp, Fan Info and other valuable information, via a simple SNMP walk. A SNMP trending application, like Denika,  will allow you to gather historical information on this data and alert on it.
  • Monitor your environment.  As I mentioned before, monitoring the room temp, humidity and airflow will make your admin life a lot easier.  If you have some room in your budget, this might be the best way to spend it.

So there it is.  I guess the old saying, “An ounce of prevention is worth a pound of cure,” is correct. Even in today’s super digital world!

- Jimmy D

Five major data center outages reported last week

The website Data Center Knowledge recently published an alarming report about five major data center outages that occurred in the past week. Here is a brief breakdown from the article.

  • “On Monday June 29, Rackspace Hosting (RAX) experienced a power outage at its Dallas data center that left several areas of the facility without power for about 45 minutes, knocking many popular customer web sites offline.
  • “Early Thursday Equinix Inc. (EQIX) data centers in Sydney, Australia and Paris each experienced power failures. While the power outages were brief – Equinix said the Sydney event lasted 12 minutes while power was restored in Paris in just one minute – many key customer sites took considerably longer to recover their systems. The Sydney event led to disruptions for VoIP service in parts of Australia, while the Paris outage caused downtime for the popular video site DailyMotion and the French portal for hosting firm ClaraNet.
  • Google App Engine, the company’s cloud computing platform, had lengthy performance problems on Thursday, experiencing high latency and data loss.
  • “A fire at Fisher Plaza in Seattle late Thursday night left many of the building’s data centers without power. The fire in an basement-level electrical room triggered sprinklers and caused extensive damage to generators and electrical equipment. The damage left tenants with backup plans offline for hours, and those without backup sites down until temporary generators restored power early Saturday morning. The biggest impact was at payment gateway Authorize.net, which was offline for more than 12 hours, leaving its merchant customers unable to process credit card sales. Other sites experiencing lengthy downtime included AdHost, GeoCaching and Microsoft’s Bing Travel.
  • “Early Sunday, July 5, a fire at 151 Front Street, the major carrier hotel in Toronto, knocked out power on several floors of the facility used by Peer 1 networks. Power was restored in about 3 hours, after a damaged UPS unit was bypassed.”

The author, Rich Miller then goes on to point out some tough questions and the lessons learned from these outages.

Although it is surprising that data centers of this size can experience an outage like this, what is even more surprising is that it all happened in a week’s span. I wonder, is the National Security Agency going to look into this?

The equipment used to monitor this size data center is monumental, but even the smallest IT department can obtain economically priced sensor equipment, like the Bitsight8, combined with Intelligent Sensors, like the AC Voltage Detector and the Digital Voltometer.

The future of data center design

I just read that the NSA is going to build a 20 acre data center in Utah. This one million square foot center will allow the NSA to decentralize its efforts and provide better security. Just imagine the amount of power it will take to operate a data center of this size? This Slashdot article points out that one of the biggest reasons why the bunkergovernment is building this compound is due to its power consumption and the current location’s inability to provide  the needed electricity. The government estimates that it will use at least 65 megawatts of power or about the same amount that Salt Lake City consumes.

“The agency got a taste of the potential for trouble January 24, 2000, when an information overload, rather than a power shortage, caused the NSA‘s first-ever network crash, taking the agency 3 1/2 days to resume operations. The new data center in Utah will require at least 65 megawatts of power” - Salt Lake Tribune

Another cool data center design is the one that Google is planning to build. The entire center will be built on a floating barge, and will use the waves of the ocean to help power the facility. It will also use ocean water to cool the equipment.

Last, but not least, is the underground data center in Sweden. This has to be the coolest data center ever! It is located underground, can withstand a hydrogen attack, has a waterfall and a greenhouse. It can generate its own power, and is equipped with triple redundancy Internet backbone access.

I wonder, what type of environmental monitoring sensors do they have? What type of redundancy and fail-safes? Designing a data center like these is a monumental task. I can’t wait to see what the future holds for Data Center Design.

- Jimmy D

The SensorProbe can Tweet!

twitterI don’t want to ride on the coat tails of Jon’s post about being able to send temperature alerts via Skype, but I guess I have no choice. I quickly wanted to point out that you can also send Twitter alerts form your SensorProbe. I imagined this as a second wave alert. Kinda that last ditch effort before the ship goes down. Ok, maybe I am being a bit over dramatic but in reality, this can be a great way to do a broadcast alert.

The process is easy. Browse to TwitterMail, insert your twitter username and password to get your TwitterMail address instantly. Then go to your SensorProbe and create an email alert. Alert goes off, email is sent and Twitter is fed.  Make sure that everyone that is supposed to recive these messages  are followers of your Twitter account.

Humidity Monitoring – Unforseen danger in your server room

I was working with a client who had to replace multiple mother boards in their server room. I was surprised that he had to replace so many, so I gently asked, “What happened?”

He said he knew I would ask him that question. Over the weekend, the air conditioning unit for their server failed. It didn’t stop, it just stopped pushing out cold air. The room didn’t get too hot (thank goodness), but it produced a lot of moisture.

Apparently, it produced too much moisture, which caused condensation on the server rack that was closest to the air conditioning unit. The end result was multiple mother boards failing.

We spent the next few minutes going over the cost of the replacement boards and drives. I then let him know that we had a humidity probe that would alert him when humidity reaches a certain level. I suggested that he add it to his order and not take the risk of loosing another segment of his server room. He thought that it was a good idea and bought two!

“Relative humidity should be maintained at a level between 30%-50%. Failure to adhere to these particular specifications could result in serious corrosion of the copper wires that are contained within the UTP and STP. Such corrosion would deter efficient functioning of the network.” – Excerpt from Cisco Networking Academy book material. So I guess the old saying, “An ounce of prevention is worth a pound of cure,” holds true. Take the time to monitor for humidity. It could help save your equipment.

- Jimmy D

Running a computer in a sub-zero environment

I just saw a post on Slashdot that was asking the question, “How to Run a Computer in a Sub-Zero Environment?” Since network design is a passion of mine, this exercise interested me. It was a simple question, definitely not common.  Even here in Maine, we are always trying to cool down our servers. I never considered someone might need to do the opposite.

Anonymous Coward (7548) gave us a real world answer. “Putting heaters (computers) in an environment meant to be cold is just adding to the cooling workload. If the computer is at any decent operating temperature, it’s going to be heating up the immediate surrounding area, and you don’t want that.”  He advised people to mount the computer outside of the cold environment and put the sensor probes inside.

Although this appeared to be a sensible idea, the discussion added a new variable – building size. People were quick to point out that the original question required that he needed monitoring for a warehouse. Most warehouses are large, some larger then a football field. Clearly the above approach would not be applicable.

Embedding the PC inside of the sub-zero environment presents another issue – condensation. How are we going to protect the electronics?

BobPaul (710574) points out:

“Since cold air has a lower capacity to hold water, warming the air should decrease the relative humidity of the air, bringing you farther from the dew point and make condensation less likely. Just let everything sit in the cooler to get nice and cold before you turn anything on and I think it should be just fine.”

This response produced quite a bit of traffic. Quite a few people disagreed with BobPauls theory. Although this could be an answer, its validity is still in question.

The best solution to humidity was pointed out by Detritus (11846). He points out that “Military equipment often uses conformal coating, which is a spray-on plastic coating that protects the components from the environment.” This method encases the electronics protecting it from moisture. To the best of my knowledge, it doesn’t provide any thermal benefit, nor is it a life long solution. I would make sure to have a humidity sensor in the enclosure to ensure longevity.

The last post that I read made the solution clear.  munpfazy (694689) writes, “For what it’s worth, we’ve always built room-temperature enclosures to house electronics gear and PCs for the work we do in Antarctica.” You can’t get much colder than that.

My conclusion, build small micro enclosures for the computer that includes the required environmental conditioning and monitoring equipment.

Let me know what you think . . .

- Jimmy D

Recommended Server Room Temperature and your Ravica SensorProbes

Today’s network meetings subject was “Recommended Server Room Temperature”. It appears that our new goal is to make sure that the server farm keeps its temperature at a constant rate. They used the network operations policy for University of California, San Diego as an example. It’s funny, but nobody knew what that temperature should be.

After some research, I found out that the general recommendations suggest that you should not go below 10°C (50°F) or above 28°C (82°F).  This is a wide range, remember these are the extremes. It is far more common for server rooms to maintain a  temperature around 20-21°C (68-71°F). Keeping it at that temperature can be difficult, there are many variables to address.

I am going to set the thermostat at 55°F and monitor it’s status throughout the day with our Bitsight8 and multiple temperature and humidity probes. I have 20 days to gather this data and report on it. My guess is that we will have to adjust the set temperature a bit before we make the network policy.

~ Jimmy D

Should we Recycle Server Room Heat?

May 22, 2009 by · 3 Comments
Filed under: Data Center, environmental monitoring, General 
Let go Green

Over the weekend, I was watching System, which is one of my favorite shows on Revision3 .com. One of the questions was on how to use the excess heat generated by computers to heat a room. I am a big supporter of  the “Reduce, Reuse and Recycle”  philosophy, and thought that was a great idea.

After some hard and heavy Google’n, I was excited to  find out other ways people have used this wasted server room energy.  The story of a  Mid-Western college  saving their greenhouse caught my eye.

“The University of Notre Dame’s high-performance computing (HPC) department has taken things a step further. It now reuses the heat generated by its servers to warm up a historic greenhouse that the city of South Bend, Ind., has threatened to shut down.”

By using the heat from the server they are saving the university $100,000 on cooling costs and the owner of the botanical garden, the City of South Bend, Indiana, another $70,000 on heating costs. It’s win-win for everyone.

Now I am trying to figure out ways to implement this type of thinking here at work and at home. I think that my first step would be to add another temperature probe to the back of the server rack. This should give me the data that I need.

Maybe I can use the excess heat to warm up my cube. I’ll update you with what I find out.

Using SensorProbes to prove the office temperature is too low.

I was angry. Well, maybe just a little mad, but no matter what, I was still upset. The office was unbelievably cold, and it had been going on for way too long.  By cold, I mean goose bumps and jackets every afternoon. When I would go for a lunch time walk (as I often do), my muscles would cramp, due to the drastic  temperature change. As I said, it was cold.

Needless to say, something had to be done. I complained to the powers that be,  but their first response was less then rewarding. Answers  like “It’s in your head,” or “You are right under the vent,” were explored; but I knew that they were wrong.

Luckily, things got worse. The temperature was getting colder and for longer periods of time. More people were saying things like “Gee it’s cold,”  or  “Hey, are you cold?” and “Turn up the heat or I am breaking up the conference room table and building the biggest bonfire this side of Boston.” In one remote section of the office there were a group of  dissidents that wanted to change the company dress code so that it included a L.L Bean Arctic Parka.

At this point I decided to throw on my Jimmy D detective hat and get some proof. Since I am Uber Geek, I decided to use the tools of my trade, I would need technology!

I integrated a high scale, super conductive data collection station into my work environment. To be honest with you, it surpasses the one that I once viewed at MIT that is currently used to monitor global warming. In reality, I secretly  moved my BitSight2 temperature probe from the server room over to my desk. I then setup Denika to trend its SNMP data. With Denika I was able to set a minimum temperature threshold, which would alert me when the temp fell.

I diligently collected data for two weeks. I even adjusted the threshold, as I saw the temp get lower and lower. The data was conclusive. I had my answer. I could now confidently register my complaint to the powers that be and demand change! At the same time, I now had the ability to defuse the previously described uprising. Viva Data! Viva Jimmy D! Maybe I need to get out of my cube more often?

Again, in reality, I took this data to my boss and quickly showed him that in the afternoons were seeing an average of 54 degrees, mornings were a bit higher. He took this to building management and they are currently in the process of finding out what the issue is.

The good news is that my cube is getting warmer, the bad news is the I got my BitSight taken away!

« Previous PageNext Page »