Router Troubleshooting Primer

In this article we will take a look at the proper steps to troubleshooting routing problems.

Although this methodology and approach can be used in just about any troubleshooting scenario, we will focus our exercise on routing. A router is a device that determines the path from a source to a destination. A router is the default gateway for a LAN, the exit point from the LAN to the WAN. The router (or gateway) is what connects more than one network segment together, whether it be two LANs, two separate WANs and so on. A router will (if programmed correctly) know the topology of a network so that if adjacent routers go down (or the lines that attach to them such as DSL, T1 and so on do), the router will be able to find a new path to send the destination data. This is meant to reconverge the network so that data transmissions can continue. To know the topology, a router keeps a table of the routes it has been programmed to know, or it has learned from a neighboring router. Routers create or maintain a table of the available routes and use this information to determine the best route for a given data packet. In this article we will look at what could go wrong and cause instability in your environment, or just outages in general.

Troubleshooting 101

When working in a networked environment, it’s common to have to troubleshoot problems often and quickly. Not only do problems come up often, but they are normally complex, require a lot of abstract thought, involved multiple parties and can be downright confusing. The following flow chart is a quick way to visualize all the steps involved in troubleshooting just about anything:

First you want to establish a baseline of normal operation. If you do not know what your router operates like normally, how will you know if it has a problem?

Next, you want to document what the symptoms are and attempt to define what the problem is. If you have a runny nose, a cough and aches, you may have the flu.
Make sure you gather all the facts based on the defined problem.

In this section you want to consider your possibilities for defining what the problem is. If it’s the Flu, then we have to attack it with rest and medication.

Make the plan, have a fallback plan in case the main plan fails.

Test all your plans, see if they work.

After the test, observe to see if the results show that the problem was resolved…

Here is where you have choices. If the problem has been solved, then great. If not, then you will want to go back to the step where you started to consider the possibilities. This will allow you to test until resolved.

Always document the solution to an issue.

Now that you have a general understanding on how to tackle an issue, let’s look at what could happen with a router. Let’s look at how to gather some facts.

Gather your Facts

When gathering facts, consider using some of these pointers when trying to figure out what the problem is, the pointers listed could help you isolate and determine what a possible problem could be:

  • Consider the OSI model when troubleshooting. In other words, make sure it’s a routing problem first, and an issue with either a layer 3 protocol or process. A router operates at layer 3 of the OSI model. This can also be confusing when it comes to arp. Arp will cause you many problems on networks if you do not understand it. Arp operates on Layer 2 and resolves MAC addresses (layer 2) to IP address (layer 3), but sometimes clearing the arp cache can help you solve routing problems because the arp cache may be holding the incorrect IP address for an interface or port on the router. Clearing the arp cache (just like on XP or 2003) on the router can help you, so consider it as an option. Now that you know what layer you are operating on (3) and you know that you may still need to do an arp cache clearing to get the data moving again, let’s look at other ways to gather some facts.
  • Router specific tools and protocols can help you gather information. CDP (which stands for the Cisco Discovery Protocol) is used to help ‘gather’ information about the network it’s running on. As well as the commands you can use on the router to check interface statistics, routing table information and so on.
  • You can use client and server based tools whether they be from Microsoft or Novell, or Linux and UNIX. Tools such as ifconfig, ipconfig, winipcfg and so on can be used to get IP information. Servers also maintain route tables.
  • Check other things that may give off the impression that a routing problem is taking place when it may be something else such as a wide-scale DNS issue, or another device causing problems like a switch, firewall or Access Point. For instance, if your helpdesk phones light up because a firewall went down, the problem may appear to be a routing problem when it really isn’t, it’s a device performing a different function.

There are many other actions you can take but this should get you started, you need to gather facts: what are the facts and what do you think the problem may be? The more investigative work you do up front, the less time you will spend later because if you don’t do this part correctly, then you will have to do it again later.

Start to Troubleshoot

Now that you think you know what the problem is, now you have to try to solve it. Let’s take a look at a sample topology.

If you have a network segment off of Router A and a network segment off of Router D, then you would want to see if you could reach from one to the other. The network may be slow and that may be because Router C had a problem and now the network had to reconverge and use links with lesser speeds…

The 56K link could be the cause of the slowdown. Now, I know this example is very basic and most networks are not commonly set up with routers all laid out in a row (unless working within a distributed star topology), but this diagram should prove a point – that it’s very important to assess the routing in your network because there are many paths a packet can take… sometimes the unintended or wrong one.

You should also learn to ping in both directions. Using remote access to remotely manage a workstation I Router D’s network (10.1.4.0), such as a terminal server or something.

Use Tracert and any other tool on your Windows, NetWare of Linux/UNIX arsenal to test with – in both directions! Make sure you see the path from router A and from router D.

To ping, open up the Command Prompt and enter Ping from your local PC to someplace past the final router, which should be router D. Ping a host such as 10.1.4.10. If you can, then you have connectivity, that does not mean it’s correct though.

Routing Table Problems

In Windows, the ROUTE PRINT command will show you the computer’s routing table, whether routing is enabled or not – you will still see a routing table. You can see a routing table on Windows XP with one NIC card; you don’t need RRAS running on Windows Server 2003 to view the route table, although if you do, you will have way more flexibility over what you can do.

A look at an XP desktop shows a simple route table for the APIPA range, the 10.8.x.x segment attached to and the default route, also known as the default gateway address.

A Cisco router routing table looks similar, but has way more detail and more complexity. There is also much more you can do with it to include a massive amount of ‘debugging’ commands that allow you to obtain very detailed and specific information on the internal processes of the router.

You may have problems with your routing table. Commonly, if you clear the routing table (on Windows it would be ROUTE ADD to add a route and ROUTE DELETE to remove one – as well as switches for persistency, etc) you will force the routers to relearn their routes and quite possibly also clear the problems – this is why a lot of times people reboot routers to clear a problem, which by the way is very bad to do. For one reason if not for many others – you clear the logs which are memory and lose them forever. Logs on routers are vital and can help determine many problems so power off such as this should be saved for extremes only.

Routing table problems include (and not limited to):

  • Inactive routes
  • Unneeded routes
  • Black hole routes
  • Flapping links (such as Frame Relay links going up and down) which causes the routes to flap
  • Invalid route tables
  • Invalid arp cache causing incorrect IP assignment
  • Problems with administrative distance or any other settings

Summary

When working with routers that connect your remote segments make sure you understand how to troubleshoot between your links, your routers may be causing your problems. To work on a network you have to understand the Wide Area Network (WAN) that connects Local Area Networks (LAN) together. The Wide Area Network is normally connected via high end routers that forward data based on how they are configured to. The data is sourced from one location, sent to a default gateway and then sent to another location from that router (based on its tables) to another router which will then forward it to where it believes the destination to be. In sum, make sure you use a good troubleshooting methodology, make sure you baseline your systems so you have a starting range to work with, use all the tools at your disposal (such as ping, traceroute (tracert) and so on) and make sure you use the tools troubleshooting in both directions to accurately determine where your problems lie.

Links and Reference Material

Cisco Introduction to Routing
http://docwiki.cisco.com/wiki/Internetworking_Technology_Handbook

Microsoft Common Routing Problems
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/ServerHelp/9f68c37b-02b6-4e1b-b898-c25389dba4f4.mspx