Archive for the 'Geeky' Category

Cisco 4506 POE, VOIP and What you need to know.

If you are deploying VOIP using POE in your enterprise, you need to stop now and dedicate a significant amount of time to planning power. In our case we have Cisco 4506 switches in our wiring closet, all equipped identically as they were purchased all at once as part of our future proofing in anticipation of VOIP. Now VOIP is a reality and I have learned a couple of lessons that I am wanting to both pass along and remember myself. This article contains details specific to the 4506 platform but the concepts are universal.

What is your input power source?

The first thing you need to evaluate is how your switches are powered. My suggestion? Go ahead and order power strips with built in amp gauges. Trip Lite has some really nice units. If you are running 110V circuits they have a sub $200 horizontal rack mount unit part number PD6974 that I recommend. Install these in each of your wiring closets and document the power circuit numbers and what power supply in each switch each circuit is connected to. This will come in handy later on if you have to coordinate changes with your facilities operations. Hopefully you will have a consistent configuration that is not only well documented but that can be easily scaled.

How many phones per switch can I support given my current configuration?

This might seem obvious but it is amazing to me how counterintuitive it seemed to be at first. We always talk about the larger project and how many phones we’re going to deploy, but you need to break it down to a per switch/per configuration aspect. Concentrating on power, I know that I have 1 or 2 switches per floor and that they are all identical in configuration. Cisco provides a very good tool for calculating power requirements called the Cisco Power Calculator. Use this tool to input your exact hardware and input the number of each different type of phone you plan to deploy. We pretty much standardized on the 7945G or 7965G depending on the individual user. This made it very easy to plug in the numbers to calculate what I could do. In my case, I configured a 4506 with a 4013+ sup, 5 blades of 4548G RJ45V and two 4200W power supplies with 100-120V input power. What I found by trial and error is that I could run up to 120 phones on a single power supply with dual 100-120V power inputs.

How does the catalyst budget power and what is my redundancy model?

Don’t gloss over this part. There are two power redundancy modes, 4 power inputs, 2 power supplies and a lot of options to utilize all those resources. Cisco tells you in documentation that you should run in power mode “redundant” and not “combined” because you can overload the switch and lose redundancy should a power circuit fail. In reality, I have found that not only can input power fail, individual power supplies can fail in such a way as to experience reduced output levels. What I said at the end of the last section was that I found that I could deploy 120 Cisco 7945/65G phones on a single 4506 chassis as long as I always had at least 2 110V input power connections. Now, I can connect those two power inputs into a single power supply and run all day long within my power budget, but I have no hardware redundancy. In this way my power mode would be “redundant” or “combined” because it doesn’t matter with only a single power supply. I have another option in that I could spread those two 110 volt circuits across a pair of power supplies, but now if I lose a power supply, I lose half my power and I can no longer run 120 phones. Plus, with two power supplies each wiht a single 110V power input source, I have to run “combined” mode to take advantage of both inputs together.

The 4506 will manage power based on a budget taken from the lowest common denominator. In other words, let’s say you have three 110V inputs connected to two power supplies. Meaning one power supply has a single input. If your power mode is redundant, the switch will budget power based on a single 110V circuit even though three are connected. If you run that same hardware configuration with “power redundancy-mode combined” you will suck from all three indiscriminately and you will be able to support more than 120 phones. Cisco implies this is a dangerous situation and they are correct in saying that. The fact is, this is not redundant once the 121st phone is connected. When you reach the 121st phone on a 4506 switch with 110V inputs and 4200W power supplies there is NO REDUNDANCY POSSIBLE in power. The simple fact is that even with 4 110V inputs and combined mode, you will not be able to sustain a failure in a power supply. What to do? PLAN!

The most important thing you can do in a project as big as a VOIP deployment is to plan every detail up front. Document how many phones your current infrastructure can support in a given area of your company. It’s that simple. If you are in a similar situation to me, and need to support more than 120 phones in a space that has a single 4506, you have to document the risks to management and options to mitigate that risk. In my case, I documented that power could be upgraded from 110V to 208V dryer circuits, or a second or third switch could be added to the floor in order to spread the POE ports between a second chassis, bringing my total supported phones up to 240.

My Recommendations

I realize this is yet another rambling post, but I’ve gone through a good bit of frustration with POE and VOIP lately on my 4506’s. My ultimate recommendation is two fold. First, document everything you have exactly and maintain that document. When negotiating expensive power renovations you must be completely well informed. Second, specific configurations must be made with the actual user experience in mind.

Let me explain. If you have a mandate that only 120 phones can exist on a 4506 because of 4200W power supplies and 110V inputs (dual) you are implying that you can sustain TWO input failures or a SINGLE power supply failure, but nothing else. If you follow Cisco’s recommendation to run “power redundancy-mode redundant” you cannot sustain ANY other failure condition. I have seen TWICE now where a power supply will be running and providing power to the switch in a degraded mode and go unnoticed. As soon as an engineer goes to move a power cable or perform some other maintenance, they impact users and I have to explain. My recommendation is that you MONITOR your power situation and phone counts and run “power redundancy-mode combined” and ensure that even if you have to do maintenance on a degraded platform that your users don’t get impacted by you. I am sure you are important but if your users have to suffer everytime you have to actually work I guarantee folks are going to start questioning why you exist.

Conclusion

VOIP is a big project.

No Comments »

Android On-screen Keyboard

There is a new on screen keyboard coming to the Android platform that is pretty neat. I downloaded it and installed it following the information at this blog.

I went to the main settings off the home screen, selected keyboard settings and chose the Swype keyboard. Then I went through the tutorial available in it’s settings menu. It’s a different way of typing, and it seems to work really very well. It’s vocabulary is acceptable for pedestrian conversations you must have while away from your full sized keyboard. I believe I shall leave the application installed indefinitely and look forward to becoming more and more Swype proficient.

I think they have a version for the iPhone as well. But who would want an iPhone anymore?

No Comments »

I Heart My Droid

Motorola Droid Gonna try not to be wordy.

Pros = Fast browser, many apps, great UI, not Windows dependent, Not Windows, GPS, voice turn-by-turn, great video, great wifi

Cons = hard to use still camera, battery management through application process killing, reserving judgment on the battery but it looks like a charge every night and maybe in the day sometimes kind of phone.

Compared to Windows Mobile <=6.1? A joke to even try to compare them. I’ve owned a lot of Windows Mobile devices since 5.0 and they all suffer from the same problems. They are not in the same class as the Droid OR the iPhone and offer no real benefits over either. Mobile word and excel is a joke at best and that’s pretty much the end of what you get “over” the iPhone or Droid.

Compared to iPhone? No iTunes = GREAT! I’ve never owned an iPhone but just having played around with them in the past, I think it is at least on the level and possibly better considering the open architecture. No AT&T but I have the Motorola Droid which I guess is Verizon only so you could say the same thing about the Droid from the other side. When will carriers stop hog tying us to them? I’d have bought Verizon anyway because I need their network to work at the house.

I got a discount on the phone and accessories with my company association. Cost wise it was cheaper than a new iPhone with comparable storage. Plus I upgraded both our phones so the wife has a Droid too. She’s medium techy and likes it and that says a good bit.

No Comments »

Tuning the Traxxas TRX 3.3 Engine

TRX 3.3First off, let me say you should follow the instructions both written and on the DVD that accompanied your model. The directions given by Traxxas will both help further your understanding of this machine and avoid the worst abuses you can otherwise unknowingly impart on your engine. What I write is no substitute for their information and should never be done until AFTER your completed break-in following the Traxxas break-in procedure. I also recognize there is a certain amount of gap between that information and a working understanding of how to tune your engine in the real world.

I was forced to learn a bit more about tuning because of the cold weather this winter and I wanted to take a chance on passing along what I have learned. First, a short lecture about what it takes to tune any engine. You’ll read about fuel mixtures and how to set those, idle speeds and settings, and other various things when it comes to tuning an engine. The fact is no one aspect exists in a vacuum but they all interact and affect each other. Don’t expect to get the best performance from an engine JUST by gapping the spark plugs or just adjusting timing or just changing fuel mixtures. You have to level set everything in rounds, check and readjust until you get everything set up right. Most mechanics will get everything in the ballpark and let it ride. In modern passenger vehicles, the computer is typically relied upon more to fine tune in real time, resulting in huge leaps of power and efficiency previously unobtainable. In the TRX 3.3, you have no computers, so you have to tune it old school.

There are five items you must be concerned with. These are (1) general running condition, (2) Factory settings, (3) high speed needle, (4) low speed needle and (5) idle air gap.

General Running ConditionS-Maxx
This references the condition of all the parts in and around the engine. The gas tank needs to be free of cracks or leaks. Check the O-ring around the lid and adjust it’s sealing properties using the alan head screw. Replace your glow plug without hesitation. Check all your fuel and pressure lines for splits or cracks. A roll-over accident can cut these lines pretty easily especially if running on asphalt or concrete. Check your exhaust system and the rubber connector between your pipe and your header. Make sure all these connections are sealed up tight and are cinched with zip ties. Clean your air filter and filter assembly. Re-oil the filter and get it back on the engine. Engine Mount ScrewsCheck the engine mounts for loose or missing screws. I’ll say here that the magnets in your EZ Start motor will attract loose engine mount screws. This is one demonstration of the fact that your success or failure will depend in part on how well you clean and inspect your model before and after running it. Check wire connections such as the block ground connection and the glow plug connector. Check mesh with the spur gear and make sure your transmission is properly seated and all of it’s screws are tight and present. Correct any issues you find in this area. I use a strip of notebook paper folded to double thickness as a way to set my spur gear mesh. If it’s tight with the notebook paper it will mesh perfectly when that paper is removed. A thicker paper doesn’t have to be folded in half. I wouldn’t use construction paper though. Basically what you want is a completely non-binding mesh such that when the teeth come into contact with each other their entire faces touch all the way across, transferring the most power possible from the output shaft of the engine to the input shaft of the transmission. Make sure the motor is squared perfectly and that the teeth mesh all the way across and not just on one side. Also check your throttle and brake linkages. Follow the directions from Traxxas to a “T” and you will not have issues. A fouled up linkage adjustment will cause your engine to act in unpredictable ways so do not ignore this part. Your internal engine components play a part as well, obviously, but I am assuming you have good compression and aren’t needing a rebuild. Once you are comfortable with the general condition of your rig, you can move on to the other steps. I highly recommend some sort of temperature monitor, either a infrared gun or the traxxas on-board temp gauge. The on-board monitor records the max, min and current temp so you can be sure you didn’t overheat during a run and just miss the peak temp by getting to your gun and checking. These engines cool very quickly from their peak temp. Bear in mind that your real goal is to make sure you aren’t overheating. There is no set temperature you should run at, but in general you don’t want to see the max temp rise above 270. I’ve seen mine hit over 300-311 when I had an issue… but when things are right, I see it around 230-240. It doesn’t start running right until it’s over about 150-160 though. The temp gauge helps ensure you don’t fry your engine and get to start over with a new one. Something you may want to check periodically is your clutch bell bearings and clutch shoes/springs. These things do wear out eventually and need to be replaced. I have heard of people replacing these things every 2 gallons of fuel run through their engine. I have replaced my bearings once and my engine has 1 gallon on it, almost. I replaced mine with sealed bearings, so they should last longer this time around. Put your engine in a large zip lock back before you take off the E-clip holding on the bell housing. That way you won’t lose it when it shoots off. If you are careful it won’t shoot off anyway, but better safe than sorry. Hey, be sure to trim all zip ties prior to running your engine. You don’t want these getting in the way of moving parts or getting snagged by a wheel or something in your operating environment.

Factory Settings

This is an optional step, but if you have any doubt about the settings of your engine being someplace close or not, it’s a great idea to reset to the default factory settings and move on from there. These are detailed in a couple different places in your included documentation. Basically, move the carb slide to it’s resting place and measure the air gap between the slide and your air horn. That gap should be 0.4-1 mm. Next, check your low speed needle. The raised parts of the screw head should be even or flush with the part of the carb slide immediately surrounding the screw. Next, GENTLY turn your high speed needle clockwise until it seats to fully closed. Do not tighten it here, just get it to touch. Next, back it out counter-clockwise four (4) full turns. If it helps you count turns you could put a dot of white fingernail polish on one side of the screw head and use it as a point of reference. These settings bring your engine back to very close to where it was coming from the factory. Check your Traxxas information and look for the chart detailing how adjustments will have to change based on elevation and ambient temperature. In cold air, you’ll want to run more rich. In warmer air you’ll tune leaner. In higher elevations, you’ll run leaner than in lower elevations. It’s all about balancing how much fuel to shoot into the carb with how much air is getting sucked in. Once you are all set, fire your engine up and get things warmed up. Make a few speedy passes to bring your engine temperature up to a good level. You can’t tune an engine that is not running or that is too cold. This is a fact: running your engine too lean will cause premature engine failure due to a lack of lubrication. This is also true: running your engine too rich might affect performance but will not damage the engine. People who contradict those two statements go against what the people at Traxxas have said. Once your factory reset is done, make sure your clean air filter is strapped on with a zip tie.

High Speed NeedleHigh Speed Needle
Your high speed needle controls how much fuel is mixed with the incoming air when the throttle is pulled past a certain point. I had to richen (counterclockwise/open) my HSN another full turn from the factory setting (4 turns) to get close to my ballpark in these cold temperatures outside. Remember, too rich is ok, too lean is too bad. Dad always said an engine will run without gas, but it won’t run without oil. It’s obvious you need gas, but it’s no good unless there is lubrication. Anyway, basically, you’ll be tuning the HSN based on how your engine performs when you accelerate. So accelerate and observe what happens. Does your engine bog down? Open the HSN 1/4 turn. Does your engine scream like a scalded dog? Close it 1/16 turn. Recheck. If you lean the needle (close it) to a point and do not notice an improvement in performance, back the needle back out 1/8-1/4 turn and retest. This is where that needle needs to be.

Low Speed NeedleLow Speed Needle
Your low speed needle controls how much fuel is fed to your engine when the throttle is mostly closed or in the process of opening. This needle is generally tuned by doing a “pinch test”. What you’ll do is with the engine warm and at idle, pinch the fuel line close to the carb and observe what happens when you cut off the fuel flow. If the engine dies immediately without changing speed, open the LSN up 1/8 turn and recheck. If the engine runs more than 3 seconds, then speeds up and dies, close the needle 1/16 turn. You want to aim for 2-3 seconds, then a speed up and die situation here. In the cold air in Northwest GA, I had to open my LSN quite a bit to get a good pinch test. Don’t be scared if you are not sitting right there at the factory settings, but go by what your engine is doing. Here’s something I did learn, you can run a really rich HSN but if your LSN is too lean, your engine will overheat. Do the pinch test and make sure of where you are. When you get your LSN set right, your engine will take off really quick without sputtering or hesitation. Keep in mind that a cold engine will not run right anyway, so make sure you warm it up before you test. After each needle adjustment, make a couple of passes to clear the engine of excess fuel, then assess the condition of your latest adjustment.

Idle SpeedIdle Speed
After your LSN and HSN are dialed in, you can back your idle speed down to the lowest reliable speed possible. This will keep your transmission from slapping the clutches and jerking your model while you are idling.

Now, go back through the HSN and LSN and recheck idle. The transition between LSN and HSN should be smooth as silk. In other words, you should be able to operate the throttle at any position and have a smooth response from the engine. It should punch hard and idle smooth and have a good attitude about running at lower speeds as well as higher speeds. I have seen where if I had to make significant adjustments to the HSN or LSN that it affects this transition. At the end, your engine should be extremely easy to start, quick to warm up and should never overheat. Your problems should be in learning how to properly control all that power you coaxed out of your engine.

I will reiterate once again that you should first reference the information from Traxxas. A lot of what I have said is straight from their documentation but I may have something wrong or off. When in doubt, reread their documentation, rewatch their DVD and follow their advice. After all, they designed it, so they should know best.

No Comments »

Network Security Thoughts

Sometimes I get the feeling I should walk in and back out when dealing with corporate security leaders. I’ve been in the unhappy position of having to show management where someone did something they shouldn’t have done and have watched those people get walked out the door. I’ve also been in the position of having to do something insecure because a manager who does not understand the risk has told me I must and personally dealt with the repercussions when the manager gets to stay. These experiences over the past couple of decades have brought me to a point in my career where I realize there is more ignorance to security than enlightenment.

Take my current situation as my excuse for writing this blog. Because of a security mandate, I am working on a project to restructure our deepest internal networks. This mandate came from someone with great intentions and I believe a strong fundamental desire and ability to make security better. Even though the source is solid, let me be clear that an edict came forth from the mountain saying thou shalt do X, which will cause you to have to do SOMETHING to make whatever it is you do as a business still happen. Let me also be clear that just prior to these big sweeping changes being communicated to my level, all but 8 people in my entire IT department were shown the door in the “spirit” of “synergy”. My disdain for this kind of corporate double talk is fuel for an entirely different rant. This is about security, so let’s focus on that.

I know every company, no matter how much money and effort they spend on security has a fundamental flaw in their security armor. Namely, people get lazy and arbitrary deadlines rule emotions. You can write a gleaming bright example of a security policy, chock full of good intentions and even better best practices but eventually, one of these things will happen as surely as death and taxes:

  1. When the policy says NO, someone’s mama (higher up) will veto the policy…
  2. Someone will open up a huge hole in the company because they are either too lazy to do it correctly or they wither in apathy because of absolutely moronic vendors who have no idea what their products DO, or…
  3. A deadline will loom and a big security risk will be assumed in the interest of a temporary fix that will live forever as a permanent solution.

There are other problems that will come out of the woodwork, but rest assured these three things will happen daily to any company with an IT department. There is very little you can do to guard against these things occurring unless you have a dyed in the wool dedication to security from the ground up, including most importantly the layers of crusty management. The fact is that an infrastructure must be built in support of the security policy AND the business requirements. That infrastructure must have hooks into productive solutions for whatever the business MUST have. It is unfortunate that most infrastructures are built around the concept that security is an optional component which can be minimized because of a low incidence of known compromise. The managers you choose to lead your company must also be dedicated to both security and the business as well. This appears to be the single biggest issue with security from where I am sitting. Security ignorance in a worker bee can be corrected. Security ignorance in management will spread and be much more painful to correct.

Even if you have a huge security team, dedicated to pen testing, code reviews, firewall change approvals, architectural reviews and policy management and enforcement, those three pitfalls listed above will all happen and they will be approved by your security leadership, in error.

Would you rather your company IT staff implement a business function that brings your payroll information dangerously close to compromise (perilously close) or would you rather that business function only be deployed when the proper security review has occurred and all concerns are eliminated?

I know how difficult and hypocritical this blog may seem, but there is a sincerity in what I am writing that is yielding an increasingly strong aversion to security laziness in me as time goes by. Anyone who has had to deal with a security compromise has had to admit to themselves there was a problem they either didn’t realize was there, or that there was a problem with an assumption they have made or a risk they have accepted willingly. Sometimes I wonder if people accepting this risk has had the pleasure of having their cyber underwear drawer rifled through by parties unknown. How many of them have seen their own systems taken over as attack systems or used to do illegal things? I doubt very many of them have, but I have seen it first hand. I also know that if those ignorant yet responsible leaders were confronted with this situation their choices as to what risk they wish to willingly assume would change.

Here are a couple of things I believe anyone with IT infrastructure should consider:

1) A security risk should be viewed as a bad check. You should not write bad checks. You should only produce a product (write a check) when you have money in the bank (all security risks eliminated). (Note that I said eliminated and meant it.)
2) A deadline should not be agreed upon until the entire project has been reviewed by your security staff and all time necessary for compliance testing and approvals are added to the plan. Your security team should be your inspectors. They should inspect everything from the footers and foundations, all the way to the top of the smoke stacks. They should also be there during initial planning so you streamline the process. No dates without proper security planning. None.
3) Security show stoppers should stop the show until a proper and unabridged solution is developed and approved. If a firewall modification is necessary and the requirements are unknown, a wide open hole is not an acceptable solution to the problem.
4) Architecture solves some problems that vendors create and refuse to make better. It is excruciating to see a vendor with a captive market create a product that has no viable competition that violates or ignores security best practices. The rapid growth of IP network technology is being seen in ever single technological vector in our existence. Most of these vertical technology market providers are stuck 10 years behind the curve for the average (not uber, but average) Internet Protocol hackers and crackers. The only thing still valid from 10 years ago is the statement that the only secure computer is the one disconnected from the network entirely. Everything else is crap, so catch up. If you must deploy an inherently insecure application within your environment, don’t connect it to anything. Don’t default to adding it to your network in any normal way. Instead, build your infrastructure in a way that is able to handle the problems, but don’t give up on beating your vendor(s) regularly. In my mind, you should post every single vulnerability you find to Full Disclosure, just make sure not to sign any contracts saying you CAN’T do that.
5) Inspect contracts with vendors and negotiate security language into that contract. Argue for support to be granted in spite of reasonable security controls they normally do not support. You cannot connect a Windows XP environment to your network without virus and malware controls at every layer of the OSI model, for example, but a huge number of vendors refuse support if you have those controls in place. This is a trend that must change. You shouldn’t deploy a UNIX environment without these controls either, btw.
5) The next time someone asks for a high risk change, stand your ground which should be rooted in a good security policy. The fact is if you give in to every hack job tech project manager who comes along you will have to eventually admit to yourself you are just a network bitch. Don’t be anybody’s bitch. Instead, be quick to do the following:

  • Explain in clear and concise language why their request is impossible to complete given security policy and if necessary explain to them why the policy exists and how important it is. Do this calmly and without emotion. Smile and subtly nod your head yes and in record time they will think it is their idea not to do what they came to ask you to do.
  • Be prepared to document the request to your manager with the information they need to fully understand why they should back you up. Sometimes the request is not defended against directly by any written policy. Be sure you are clear, concise and correct.

I’m tired of writing and I feel better.

No Comments »

Resize Images in Linux

Most people probably already knew this, but this is my note to self:

`mogrify -resize 1024 *.jpg`

Maintains aspect ratio and applies the resize to all the files you specify.

That’s pretty slick.

Credit for making me aware of this: http://www.smokinglinux.com/tutorials/howto-batch-image-resize-on-linux

2 Comments »

FireGPG

Use GPG easily in Mozilla Firefox

I must have been living under a rock since March 10th, 2007. That was the date of the first release of FireGPG. The name merges Firefox, the best browser in history, with GnuPG, which is the Gnu Privacy Guard package. FireGPG is a firefox plugin which brings a user friendly interface for GnuPG to Firefox.

GnuPG is a free implementation of the OpenPGP standard defined in RFC 4880. PGP stands for “Pretty Good Privacy” which humbly states the obvious. You can encrypt text using public and private key pairs and exchange this data with people you have public keys for. This ensures messages come from intended senders and are only opened by intended recipients. It’s good stuff.

FireGPG actually even has a tie in to Gmail which is a service I use. It integrates seamlessly with the compose message interface, providing buttons to clear sign a message or encrypt, sign and send. It is really slick.

No Comments »

Cisco’s WAN / MPLS Redundancy Silver Bullet

How many of you have ever heard of OER? How many times have you had redundant paths where an immediate failover would have saved you 180 seconds of BGP dead timer? When we’re talking about a transit link in the Internet, even with respect to multiple Internet connections for a company, it’s typically acceptable to suffer through the rare 3 minute dead timer on a dead BGP peer link, but when you are running VOIP across your redundant MPLS WAN you can bet the failures will only occur when the CEO is talking to the head sales VP, and trust me, a three minute outage window is not going to make anyone happy.

We’ve been running along for years with a redundant MPLS WAN and have relied exclusively on BGP to handle failover. In a sense we are lucky to have redundant carriers and the ability to recover relatively quickly from a failure. Even still, we suffer for short but extremely painful periods of time when things go wrong. Worse, we all know this is a best case scenario and we really can’t do a lot except manually intervene in the case where performance across a given cloud changes or degrades. Sure you can monitor these things using the Cisco SLA responder tests for calculating MOS scores and ICPIF scores, but to do something about those things changing BEFORE the CEO’s call gets disrupted or untenable is the glory we all seek which makes us invisible.

We would also like to believe the investments we make in infrastructure and telco are well placed. It diminishes the argument that is made for redundant WAN carriers when one “side” typically sits idle and unused. The core of the argument when you purchase that second WAN link is redundancy and the elimination of a single point of failure. Any frugal network engineer or business owner will eventually circle back around and ask themselves or you to tell them how many times that redundancy carried the business through a failure situation. A cost benefit analysis might reveal a lop-sided investment despite the utopia redundant connections bring. Wouldn’t it be better if you could tell them that both infrastructures are in use at all times, providing real time benefits on a daily basis?

Enter OER. OER is Optimized Edge Routing and is being renamed to PfR or Performance Routing. Cisco’s site engages this topic using those names interchangeably. I am still learning the extent of what OER can do, but essentially it has the ability to alter your local routing table. It makes decisions about what to do based on several performance metrics like MOS, RTT, delay and Packet Loss, based on how you configure those metrics for each application you describe to it. It can also passively learn about network prefixes floating through it. You can identify applications based on NBAR, prefix lists or access lists. It’s like a fluid and intelligent policy based routing engine.

We have been investigating OER for a while now because like many companies we are facing converged data, video and voice across IP which is raising the demand on WAN networks where bandwidth is still relatively expensive. In essence what I have been looking for is a way to leverage our entire investment in technology. I didn’t find OER, another engineer did, and I am glad he did. There are a lot of wild ideas you can come to like running OSPF over gre tunnels on top of your BGP infrastructure, and very few of those ideas are really workable in a production environment.

We were given the opportunity to explore OER with Cisco in their Raleigh, N.C. lab last month and we jumped on it. If you’ve ever had the chance to tour the Cisco campus on either the East or West coast you know how hard it is to come away without having a lot of faith in the Cisco brand. They have really spent a great deal of time and money developing technology and turning out support for that technology in a workable way. Their proof of concept lab was a great experience, and allowed us to mock up our entire WAN and test OER under several different scenarios. It’s not just having the massive amount of equipment available to test with, it’s having the expert people to work with that makes the trip worthwhile. If we’d deployed OER ourselves we might not have found that bug in the latest version of IOS that would have caused problems, but our engineer talked to one of the primary OER developers when HE saw the problem, and was able to make a solid recommendation to avoid trouble before we got to it. I realize they like to tie CPOC trips to major spends, and we’re looking at Cisco VOIP, but the experience was fantastic. I encourage the use of the CPOC. I’ll bet you won’t find a better engineer than Keith Brister, either.

After our two day CPOC lab discovery with Cisco and a scheduled maintenance window to bring our current primary router up to recommended OER enabled code (12.4.15T7) it came time to turn on OER. I decided to make this change going into a weekend for a couple of reasons. Normally I don’t make changes on a Friday for obvious reasons, but I didn’t want to make a problem for myself in the middle of the week. Folks want the WAN to work when they are at work after all, and I felt an opportunity with a weekend of long running backups coming up. it was a unique opportunity to see lots of data impact the data circuits. I have not been disappointed in the least.

We have 5 remote sites, 4 with a T1 from each provider and the 5th with a 9 meg connection from each. Our Atlanta corporate site has a 12 meg circuit to each provider connected to a single Cisco 7204 VXR router running an NPE 400. Our plan is to deploy a second VXR router to handle one of the hand-offs which will obviously step up our redundancy a good bit. Our remote sites just got upgraded with 2811’s so they were perfect for the OER rollout as well.

I planned the deployment and came to the following several criteria:

  1. I don’t want it to learn prefixes. I’ll define what I want it to know about.
  2. I want to define VOIP as an application but the rest will be blanket definitions of data networks.
  3. I want to test data networks by echo probes and make decisions solely on latency and health.
  4. I want to test MOS scores for VOIP and make decisions based on loss and delay.
  5. I do not want to do anything with load sharing. i believe we will get that inherent to the OER methods.
  6. I want to start out with OER in route observation mode.

Based on these few criteria, I followed these steps to bring OER on-line.

  1. For each site I created prefix lists named for each site. I put all the prefix lists on all the routers just to keep things consistent.
  2. I turned on the IP SLA responder in each location with the “ip sla responder” command.
  3. I built a key chain for each site. This key is used by OER to authenticate the conversation between the Master Router (MR) and the Border Router (BR).
  4. I configured oer master with logging and described the border router for each location. In our case, the MR and BR are the same router. In any case I used the loopback interface for OER.
  5. I configured oer border in each location, pointing to the master configuration and turning on logging there, too. This is the shortest part as the border just does what the master says. The master is where all the guts of the configuration really goes.

Once all that was done, I checked OER in each location using the “show oer master border detail” command. This command will check the external and internal links and tell you if OER itself is functional:

Border           Status   UP/DOWN             AuthFail  Version
10.105.105.8     ACTIVE   UP       2d23h          0  2.1
 Fa2/0           EXTERNAL UP
 Po10.401        INTERNAL UP
 Po10.30         INTERNAL UP
 Po10            INTERNAL UP
 Se3/0           EXTERNAL UP             

 External            Capacity      Max BW   BW Used    Load Status          Exit Id
 Interface            (kbps)       (kbps)    (kbps)    (%)
 ---------           --------      ------   ------- ------- ------           ------
 Fa2/0           Tx     12000        9000       121       1 UP                    2
                 Rx                 12000      5996      49
 Se3/0           Tx     12000        9000       178       1 UP                    1
                 Rx                 12000      1975      16

This output shows OER is up, sees internal and external links and is aware of utilization of each link. The most interesting part of this is the first two lines. It’s hard to see here, but the first line is column headers for the second. It’s saying border router 10.105.105.8 is ACTIVE and UP for 2 days, 23 hours with 0 authentication failures and it’s running OER version 2.1. This is all good. Now we get to go ahead and start talking about applications and network prefixes, as well as what we’re going to do about each performance metric.

This is also where it looks like you can become artistic in your approach to how to manage your OER deployment. My primary concern with OER early on was that I didn’t want to deploy anything that would do things in unexpected ways and become hard to manage over all. The fact of the matter is that while you can get fairly elaborate in what tests you perform or how many you run for each oer-map criteria, the interface to OER is simple and easy to manage. Here is what I did in brief:

  1. In Atlanta, I added an oer-map for NY VOIP and LA VOIP, referencing a match to their respective prefix lists. I did this across two separate but identical oer-maps. The reason is, the decision I make about routing to LA might be necessarily different than that to NY. Here’s what my oer-map looked like
    oer-map 10 10
     match traffic-class prefix-list NYVOIP
     set delay threshold 750
     set mode monitor fast
     set resolve mos priority 2 variance 10
     set resolve delay priority 3 variance 10
     set resolve loss priority 4 variance 10
     set loss relative 500
     set jitter threshold 15
     set mos threshold 3.76 percent 30
     set active-probe jitter 10.105.105.5 target-port 1025 codec g729a
     set probe frequency 2
    
  2. I then went ahead and added a data network oer-map for each remote site. There’s of course no reason to describe Atlanta to Atlanta, so I left it out. Here is a representative sample:
    oer-map 10 40
     match traffic-class prefix-list NY
     set delay threshold 750
     set mode monitor fast
     set active-probe echo 10.105.105.5
     set probe frequency 2
    
  3. Once I did this, while still in mode route observe in oer master, I added the policy group to OER. I named mine stupidly (10) but you could have called yours anything you like. Once you tie the policy group to OER, the tests begin. You can see the results by issuing an “show ip sla statistics” command.
    Round Trip Time (RTT) for       Index 788
            Latest RTT: 24 milliseconds
    Latest operation start time: 20:01:08.520 EST Sun Nov 2 2008
    Latest operation return code: OK
    Number of successes: 88
    Number of failures: 0
    Operation time to live: Forever
    
    Round Trip Time (RTT) for       Index 793
            Latest RTT: 32 milliseconds
    Latest operation start time: 20:01:08.560 EST Sun Nov 2 2008
    Latest operation return code: OK
    RTT Values:
            Number Of RTT: 50               RTT Min/Avg/Max: 27/32/38 milliseconds
    Latency one-way time:
            Number of Latency one-way Samples: 50
            Source to Destination Latency one way Min/Avg/Max: 18/21/26 milliseconds
            Destination to Source Latency one way Min/Avg/Max: 8/11/14 milliseconds
    Jitter Time:
            Number of SD Jitter Samples: 49
            Number of DS Jitter Samples: 49
            Source to Destination Jitter Min/Avg/Max: 0/2/8 milliseconds
            Destination to Source Jitter Min/Avg/Max: 0/3/5 milliseconds
    Packet Loss Values:
            Loss Source to Destination: 0           Loss Destination to Source: 0
            Out Of Sequence: 0      Tail Drop: 0
            Packet Late Arrival: 0  Packet Skipped: 0
    Voice Score Values:
            Calculated Planning Impairment Factor (ICPIF): 11
    MOS score: 4.06
    Number of successes: 88
    Number of failures: 0
    Operation time to live: Forever
    
  4. These are results for two of the tests. What you will wind up with is results for each exit interface for each oer-map. In my case, I have two exit links. So for each oer-map that tests jitter to a remote site, I get two jitter test results. In case you didn’t know this, you can graph the jitter test results with your favorite (cacti) SNMP monitoring system. It’s nice to have these trends recorded when you start trying to troubleshoot issues relating to jitter in the future. You can also turn these tests on without OER if all you want to do is monitor. BTW, if you want to pinpoint which results go to what remote sites, it won’t be obvious unless you do something like run them over different ports. In my case, I didn’t think of this until I was finished, but I guess I just have to figure out how to identify what exit link and site each test is really for. Oh well.
  5. Take a look at “show oer master traffic-class” as well which will tell you exactly what OER is doing based on each prefix you are working with.
    wan-7204#show oer master traffic-class
    OER Prefix Statistics:
     Pas - Passive, Act - Active, S - Short term, L - Long term, Dly - Delay (ms),
     P - Percentage below threshold, Jit - Jitter (ms),
     MOS - Mean Opinion Score
     Los - Packet Loss (packets-per-million), Un - Unreachable (flows-per-million),
     E - Egress, I - Ingress, Bw - Bandwidth (kbps), N - Not applicable
     U - unknown, * - uncontrolled, + - control more specific, @ - active probe all
     # - Prefix monitor mode is Special, & - Blackholed Prefix
     % - Force Next-Hop, ^ - Prefix is denied
    
    DstPrefix           Appl_ID Dscp Prot     SrcPort     DstPort SrcPrefix
               Flags             State     Time            CurrBR  CurrI/F Protocol
             PasSDly  PasLDly   PasSUn   PasLUn  PasSLos  PasLLos      EBw      IBw
             ActSDly  ActLDly   ActSUn   ActLUn  ActSJit  ActPMOS
    --------------------------------------------------------------------------------
    10.50.1.0/24              N defa    N           N           N N
                              OOPOLICY     @105      10.105.105.8    Se3/0      BGP
                   N        N        N        N        N        N        N        N
                2678     2525   600000   491525        N        N
    <..snip..>
  6. In this example, I have a prefix (10.50.1.0/24, my LA data network) out of policy (OOPOLICY). Further down the output I also see my VOIP network for LA was moved recently to the other exit interface. Something must be going on with one of my WAN connections in LA. The best part is that things get managed by OER based on your criteria in the oer-map sections of your master configuration(s). In a subsequent execution of the “show oer master traffic-class” command, I see the data network was moved over as well. It is likely the VOIP network moved quicker due to the fact that I am looking at delay, loss and jitter and not just delay like I am with the data network blocks.
  7. Once you have your OER set up though, you should give it a long time to cycle through all of it’s tests in order to settle out and be ready to be put in control. OER is deliberate in it’s attempts to intelligently test and manipulate things. I didn’t run into a single issue where routes got screwed up or anything. But patience is certainly helpful. Also, until you put it in to route control mode, you won’t see any OER routes and all your STATUS messages in the traffic-class output will have *’s beside them, indicating OER is not controlling those prefixes.
  8. The most exciting part is putting OER in control. To do that, go into the oer master configuration mode and type “mode route control”. I am sure you have an ITIL compliant change control request already scheduled, right? Anyway, no one will notice except that you will become a little more invisible because fewer problems are going to be noticed by anyone, especially the CEO on that VOIP call.

My own results

It’s not my intention to sing the praises of one equipment vendor over another, but there are many reasons why Cisco is the market leader. Cisco has built some of the most robust network gear on the planet. I mean, I know there is faster gear out there at the mid/large sized business market, but sometimes the real need is rock solid reliability. This doesn’t mean other vendors are inferior but it does mean there is a lot of intellectual horsepower built into those green boxes out there.

My own personal results with OER are extraordinary in my opinion. Remember I told you there were long running backups on the weekend? I noticed on Saturday that 4 of my T1 connected sites were backing up over one WAN cloud while NY with 9mbit connectivity was backing up over the other cloud. OER basically gave NY a dedicated pipe for backups to run over and consolidated some of the smaller sites across the other cloud. I couldn’t have asked for a better result in that case. I have noticed OER moving other networks around because of things like LOSS and delay as well, but I haven’t really dug into what was going on while those events occurred. I did notice it moved my two west coast offices over to one MPLS provider but left everything else alone tonight. Was there a problem with the other provider? Not sure. It is highly recommended to have a good netflow analysis tool handy though. Scrutinizer by Plixer International is a great tool. The map of our WAN is a quick and easy reference to find where bandwidth is being pushed to and from by OER.

Other Notes

OER is included in IOS. We didn’t pay an extra license for it to be in our router code, but it seems like a great thing to have. Here’s another thing to like about it. Since it tests links performance and moves things based on that, if you have a link go down or a BGP peer otherwise die, OER will see that and move things within a few seconds based on your testing frequency. Otherwise you would have to wait 180 seconds for the BGP dead timer to expire and pull those routes from the table.

Oh, and I told it to choose the best exit based on test results, not just any good link. That way, if both links are failing, it will choose the least bad link.

I’m glad you read this post. Isn’t OER cool?

No Comments »

Linux Backups

Google CodeFlyback

I started using computers after the days of the punch cards. The horror stories I have heard about those days make me glad. It may be that I came into the world of computers after what some would consider the dark ages, but I’ve also witnessed a fair amount of improvement myself. My first computer stored media on audio cassette tapes. I spent many hours filling tape with the sounds that would be interpreted into code and spent at least half of those hours listening and hoping the process would complete successfully. Now, machines have relatively reliable spinning media hard disks that house data the likes of which I never imagined in those 8 bit days.

In all that time the thing I have been most guilty of is not performing regular backups of my data. It’s too much hassle, checking the status of things and testing restorations, etc. Also I am notorious for completely reinstalling my systems, testing out the latest releases of openSuSE, Fedora or Ubuntu. The overhead of doing backups has always been a deterrent for me and so I rarely even follow what others are doing to answer the problem of desktop backups.

Today I started to get curious. See, it’s really Ubuntu’s fault. 8.04 Hardy Heron is looking mighty good. I had been playing with some operating system or other on my laptop, and when things ran afoul, I decided to install Hardy Heron. It’s still in beta for another week or so, so getting everything up to date and all my necessary add-on packages took quite a while. While I was running through everything, I thought it would be neat if someone would make Time Machine for Linux like in Apple. I did a search and did a lot of catching up on the backup scene.

You will notice the Google code logo on the top of this post. What I found is a piece of software called “Flyback”. This appears to be a relatively simple software interface which uses tools that have been around forever, specifically and most importantly, rsync. I decided to switch my brain off and see if the thing works or not. Put simply, it does. I set up a NFS share on a Linux server after installing Flyback. I added the NFS mount to my fstab and mounted the share. After telling Flyback where to store backups, and then telling it to ignore the backup folder itself, I ran a backup. The first run produced a 3.1 Gigabyte directory structure. I did some other playing around, downloading and installing, reading web pages, doing basic poking around. I ran another backup after all of that and guess what? It produced a 74M incremental backup.

I did some testing with restoring a couple of files as well, which seems to be one of the coolest parts. The interface is totally and completely point and click. I couldn’t be happier. I realize now that a lot of people have been using rsync to perform backups. I also know there are other tools that purport to be just as good as Flyback, which is fine. The thing I like is how flyback organizes backups sort of the same way as Time Machine does it. I can choose to view what is on my machine as of right now, or flip back one backup cycle and see it the way it was then, or go back another backup cycle and see it that way. It’s everything but the pretty 3D flying through space thing.

In short, I feel confident I have found a new backup home. It’s one I think I can keep running with my limited time schedule. It’s also one I think won’t eat up my server space out of turn and will allow me to get at my data with or without the machine the data started on. Since it’s just an rsync copy, I can do with it whatever I want to. I don’t have to have the backup software in order to access or restore from my copies. The only thing I did that didn’t seem to be handled in the packaging was, I hand created the panel launcher for Flyback using “gksu” so it runs as root instead of me. I do that not because I don’t care about possible security side effects, but because I want to be able to backup and restore more than just my home directory.

Give Flyback a quick try. I believe it is a worthwhile solution to every day end user backup needs.

1 Comment »

Contrast or Gamma set wrong? (More notes for me)

My MythTV box decided to make it’s display too dark for me to see anything tonight.  I discovered pretty quickly the problem was gamma, contrast and brightness setting that somehow magically changed.  `ssh -X mythbox` and `nvidia-settings -c :0.0` allowed me to go in and set these settings remotely.  Obviously this only works with NVidia hardware.

No Comments »

Next »