{"id":815,"date":"2012-07-25T22:58:22","date_gmt":"2012-07-25T22:58:22","guid":{"rendered":"http:\/\/microsoftgeek.com\/?p=815"},"modified":"2012-07-25T22:58:22","modified_gmt":"2012-07-25T22:58:22","slug":"the-importance-of-network-redundancy","status":"publish","type":"post","link":"https:\/\/microsoftgeek.com\/?p=815","title":{"rendered":"The Importance of Network Redundancy"},"content":{"rendered":"<p>An in depth view of current technology and strategies used to create  redundancy in your WAN and how to properly design, implement, monitor  and test in case of any disaster that may occur as well as covering  briefly other redundancy options.<\/p>\n<h2>Introduction<\/h2>\n<p>Now more than ever, today\u2019s businesses require  reliable network connectivity and access to corporate resources.  Connections to and from business units, vendors and SOHOs are all  equally important to keep the continuity when needed. Business runs all  day, every day and even in off hours. Most companies run operations  around the clock, seven days a week so it\u2019s important to realize that to  keep a solid business continuity strategy, redundancy technologies  should be considered and\/or implemented.<\/p>\n<p>So, we need to keep things up and available all the time. This is  sometimes referred to five nines (99.999) uptime. The small percentage  of downtime is accounted for unforeseen incidents, or \u2018scheduled  maintenance\u2019 and usually set to take place during times of least impact,  like in the middle of the night, or on holiday weekends if planned. If  this is not a part of your systems and network architecture it should be  considered if you want to keep a high level of availability. Because  things break and unforeseen events do take place, we need to evaluate  the need for creating an architecture that is \u2018highly available\u2019, or up  as much as possible, with failures foreseen ahead of time and the only  downtime, is to do planned maintenance.<\/p>\n<p>To keep the company\u2019s workforces, and their customers connected and  operating, we need to plan for it. With servers you can cluster and with  web properties and applications, you can load balance. Almost every  vendor today puts out a line of products to facilitate this need via  hardware, software and now via virtualization design.<\/p>\n<p>In this article we will take an in depth view of current technology  and strategies used to create redundancy in your WAN and how to properly  design, implement, monitor and test in case of any disaster that may  occur. We will also briefly cover other redundancy options for servers  and other network architecture, power and applications.<\/p>\n<h2>The Importance of Redundancy<\/h2>\n<p>Today\u2019s networks are high-tech  and most times high speed.\u00a0Common to most Wide Area Network (WAN)  designs is the need for a backup to take over in case of any type of  failure to your main link. A simple scenario would be if you had a  single T1 connection from your core site to each remote office or branch  office you connect with. What if that link went down? How would you  continue your operations if it did? In this section we will explore this  scenario and other scenarios to help you design and plan for a backup  solution that you can count on and one that is cost effective and will  not break the bank.<\/p>\n<p>Network redundancy is a simple concept to understand. If you have a  single point of failure and it fails you, then you have nothing to rely  on. If you put in a secondary (or tertiary) method of access, then when  the main connection goes down, you will have a way to connect to  resources and keep the business operational.<\/p>\n<p>The first step in creating network redundancy (particularly in the  WAN), is to set up a project plan that will allow you to scrutinize the  current architecture\/infrastructure, plan for a way to make it  redundant, plan for a way to deploy it and then set up a way to test it.  Nothing should be thought of as \u2018complete\u2019 until you have tested  everything for operational success. Your final step will be putting in  policy and processes that allow you to monitor it and be alerted when  things do fail so you can take action. Commonly a company\u2019s security  policy, disaster recovery plan, business continuity plan and\/or incident  response plan will leave room for this type of solution.<\/p>\n<p>Testing however is the key to your success. This is not a \u2018set it and  forget it\u2019 design. The main link failing means that the backup should  take over automatically if that is how you designed it. There can be one  of multiple issues that may not be self-repairable, or resolvable  without interaction so if not set automatically, you will need an  incident response plan to account for it. You should also have a  follow-up procedure regardless of automatic or manual. This means, when  implementing redundancy into your systems or network, you need to take  action immediately regardless, even if your operations continue to take  place to verify that everything did go as planned. If not, then you need  an after-action report where you can specify how things will be fixed,  or redesigned \u2013 then retested.<\/p>\n<p>Analysis is critical to building a good redundancy plan. Almost every  network created is unique in some way. This is why you must analyze and  take note of not only the common items that would require redundancy,  but also other solutions in place that you may not have considered such  as mainframe access as an example.<\/p>\n<p>First, a risk analysis assessment must take place. Next, the core  site (or core sites) must be taken into consideration if that is where  the bulk of your resources are located, or where the majority of your  business connections terminate. Routing and routing protocols need to be  considered. Solutions exist (such as when using Cisco Systems devices  and software) where specific protocols can be used to handle the  failover process for you if implemented correctly. Load balancers,  failover solutions and protocols are available to facilitate just about  any redundancy option you can imagine.<\/p>\n<p><strong>Note:<br \/>\n<\/strong>You should always consider the hardware. For servers  and network devices, redundant hot-swappable power supplies and drives  (as well as other components) are used to keep everything up and running  when a disaster occurs. Also, disk drives can be deployed in a way  where the data is spread across multiple drives such as when using RAID,  so that if a drive does fail, the data is not lost. Data should also be  saved (backed up) in multiple locations to provide restore redundancy  options. Having off-site tape storage is one such example, as well as to  have data replicated across multiple hosts using technologies such as  Windows DFS.<\/p>\n<p><strong>Tip:<br \/>\n<\/strong>The Local Area Network (LAN) must be examined for  single points of failure as well. If your LAN only uses one switch and  both routers are connected to it, if the switch fails, so does the LAN,  as well as access to the WAN.<\/p>\n<p>Once the design phase is complete, a cost analysis session must be  completed as well. Competing business rivals looking for your business  will hope that you budget, plan and design this solution incorrectly.  Creating redundancy is 90% of the design only leaves you with a possible  10% failure scenario. This means that you spent a lot of time, money  and resources putting a solution in place that still has a single point  of failure and if this single point creates a significant amount of  downtime, then you spent all that time and energy leaving yourself  vulnerable to failure regardless. A good management team will ensure  that nothing is left open so that the investment is sound and does what  it says it will \u2013 keep your operations running.<\/p>\n<p>Considering your current network design and architecture is critical  (as mentioned earlier). A simplified view of a company network may be a  core site location (perhaps the company headquarters) where all of the  servers, systems, applications and main infrastructure reside. In this  article we will call this \u2018core infrastructure\u2019.<\/p>\n<p>Figure 1 shows the layout of an extremely common design, where a  branch office needs to connect to a core site where centralized  resources are located such as financial applications, enterprise  resource planning (ERP) software, databases, file server data and so  on.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.windowsnetworking.com\/img\/upl\/image0011276266242001.jpg\" border=\"0\" alt=\"\" hspace=\"0\" align=\"bottom\" \/><br \/>\n<strong>Figure 1:<\/strong> A common WAN connection scenario with redundancy in place<\/p>\n<p>Here you can see how the remote branch office connects to the core  site. Here, there is a dedicated MPLS circuit\/link that provides  bandwidth at approximately 1.5 Mbps which is the connection speed of a  T1. The MPLS routers are connected to the network via a network switch.  Commonly, the router is the networks default gateway, where all packets  are sent that are not found locally. The router needs to make a routing  decision and since the main link is up, decides to send via the MPLS  connection. When the main link drops, commonly the alternate link is  used if set up correctly. This provides your remote site with a new path  to reach the resources needed to continue operations.<\/p>\n<p>Back about 10-15 years ago, the technology that connected most  corporate sites were Frame Relay and Integrated Services Digital Network  (ISDN). Although ISDN is still used in some fashion, Frame Relay has  gone by the wayside being replaced by Layer 3 enabled TCP\/IP-based high  speed networks such as MPLS which we will cover momentarily. The company  core and remote sites are normally connected via some form of WAN-based  technology, such as OC3\u2019s, T3\u2019s, T1\u2019s, DSL, MPLS and many others. The  most common forms today are Multiprotocol Label Switching, or MPLS  (private) and some form of Internet connected link (public, or shared)  connected via an encrypted tunnel called a Virtual Private Network  (VPN).<\/p>\n<p>VPN\u2019s are similar to those you may connect with via your laptop when  connecting to work from home. The same technology is used. Since you are  connecting over the unprotected public Internet, encryption must be  used to secure the transmission of data and communication. Routers and  firewalls if ordered with specific VPN functionality installed and  enabled can connect your remote sites to a core site in the same manner,  encrypting the data you send over the public Internet via what is  called a \u2018tunnel\u2019. This is often used because it\u2019s cheaper than a  managed, dedicated and private MPLS connection and normally lays dormant  until used. It\u2019s also secure. Since a SLA is purchased on the main  link, the provider needs to get this back up and running quickly so you  won\u2019t need to be dependent on the alternate link for long.<\/p>\n<p>MPLS by nature is a redundant network. It is commonly managed by an  ISP that provides you a connection to it, or a router that they will  manage as well. When data is sent from one site to another, it\u2019s passed  into a \u2018cloud\u2019 which is a private network that the ISP manages. Packets  enter the cloud and into the ISP\u2019s meshed network where redundancy is in  place. Although highly available internally, if the router that  connects your remote site fails, or the link fails\u2026 your site will be  down regardless unless you have a redundant router and link there to  pick up in the failed routers place.<\/p>\n<p>ISP\u2019s will sometimes offer Internet access via their MPLS network as  well. It is not recommended that you use this particular Internet access  to connect your VPN tunnels for backup, because it could be potentially  connecting to the same network that failed you in the first place.<\/p>\n<p>So, all that remains to answer is one last question \u2013 what if both  links drop? You should make sure that you have a way into your remote  sites via Out of Band (OOB) management, such as a modem connection to a  serial port of a connecting router. This gives the chance to test  internally to the site (from the local area network [LAN] in times of  complete cut off. The modem can be connected to a serial port on a  router, or on a server on the LAN. As long as a dial-up line is  configured and you can dial into it and connect, you can access your\u00a0 Be  sure to secure everything correctly, do not allow someone to war-dial  into your network via the modem, which is fairly easy to do if not  locked down.<\/p>\n<p><strong>Tip<\/strong>:<br \/>\nYou should always create an alternate link in to your  remote sites when both links are down. This can be done with an  extremely cost-effective (cheap) dial up solution via a POTS line. You  can connect a modem to the router, a server or laptop and have LAN  access to troubleshoot or provide access out if needed.<\/p>\n<p>There are companies that are more complex, for example if a company\u2019s  strategy is to merge and acquire other companies, there may be core  infrastructure located in multiple core networks with mission-critical  resources. These companies should consider a strategy to simplify the  layout and consolidate and relocate to a centralized (and protected)  location, but this may not always be the case. In these situations,  achieving a redundant network backup plan can be difficult. Sometimes  it\u2019s set up like that by design, for example, if you had a core\/central  headquarters site where some of your resources are located, and others  placed in a co-located data center. Using resources outside your network  and managed by others (or outsourced) is considered \u2018cloud computing\u2019  and SaaS, or \u2018software as a service\u2019. Also, you need to consider the  differences of keeping services in house or using outsourcing solutions.  If your architecture is outsourced partially or completely \u2013 you must  thoroughly examine the provider\u2019s policy, Service Level Agreement (SLA)  and polices for conducting work.<\/p>\n<p>You should also always consider the LAN connections of your routers  and the LAN\u2019s default gateway assignment. For better design, create a  bullet-proof (redundant) LAN by installing two switches and making them  the default gateway. This way, if the router fails completely (such as  looses power), the default gateway address is still intact, and the  switch can make the decision, not the router. See figure 2 for a  redundant LAN connection where routers are connected and the switches  make the routing decisions in the case where the router with the default  gateway assignment is at risk.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.windowsnetworking.com\/img\/upl\/image0021276266242017.jpg\" border=\"0\" alt=\"\" hspace=\"0\" align=\"bottom\" \/><br \/>\n<strong>Figure 2:<\/strong> Creating redundancy in your LAN using Multiple Switches<\/p>\n<p><strong>Tip:<br \/>\n<\/strong>It is recommended that you purchase hardware with  multiple hot-swappable redundant power supplies, and make sure that the  power source is on a generator, has a backup, and\/or is phase correctly  so you can survive a power lose entirely. Enterprise UPS systems, backup  generators and so on can be used to provide alternate and redundant  power solutions when needed.<\/p>\n<p><strong>Note<\/strong>:<br \/>\nOrdering lines from an ISP or vendor can take some  time so make sure you size them correctly and then order them ahead of  time which can save you time with the deployment of your redundancy  solution.<\/p>\n<p>Geographical considerations should also be considered. Consider the  Point of Presence (POP). If you use one provider and they have internal  problems on their network, or a disaster occurs (such as a hurricane)  that affects the area, then you will be at the mercy of the provider no  matter how many lines you have in place. Disaster recovery solutions  account for this, especially if you are using a co-located data center,  such as a cold\/warm or hot site.<\/p>\n<p>In sum, a simple rule of thumb when designing and planning for  redundancy is, do the job right\u2026 do not cut corners to save money  because if you start to spend capital on redundancy and still leave a  single point of failure in place (like a single switch), a small cost  could jeopardize the entire solution which did in fact cost you a lot to  implement. Lines, routers, human resources to implement it and daily  monitoring cost money. It would be silly to leave a simple single point  of failure in place when you spent time and money on the rest of the  project. Consider doing the project correctly by designing it, budgeting  for it and getting the right people in place (or trained) to deploy,  manage and monitor it correctly\u2026 and lastly, test it for accuracy.  Consider everything and leave no stone unturned.<\/p>\n<p>After you have designed and planned your solution, you need to  consider a few things during deployment. If in-house solutions are used,  your employees need to know how to handle an incident. If out-sourced,  then your vendor\/provider needs to have a plan in place to add, manage,  monitor and then recover from a disaster. Test plans should also be  considered in both scenarios.<\/p>\n<p>You should also consider a more complex design when considering how  to add and deploy your solution. Figure 3 shows a more common solution  for a large company.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.windowsnetworking.com\/img\/upl\/image0031276266242017.jpg\" border=\"0\" alt=\"\" hspace=\"0\" align=\"bottom\" \/><br \/>\n<strong>Figure 3:<\/strong> Connecting Multiple Remote Sites to a Core with a Redundant Solution<\/p>\n<p>Here you have multiple remote sites connecting to a core (or multiple  core) locations. Redundant links provide an alternate solution to main  link failure on your WAN and pro vide remote site access to core  resources.<\/p>\n<p>You should also consider failover technologies. Must like server  clustering, network equipment can also failover to other devices (such  as routers, switches, firewalls, etc.) if configured to do so. For  example, a Cisco router can be configured with Hot Standby Router  Protocol (HSRP). Servers can be clustered and load balanced for any  failover scenario you can think of. With Windows Hyper-V, VMware and  Citrix virtualization solutions, you can create a design where any  failure can be dealt with automatically keeping operations running in  the case of any disaster.<\/p>\n<p><strong>Tip:<br \/>\n<\/strong>It\u2019s important for VPN clients to be able to access  corporate resources as well, whether on the road or in a home-based  office. That being said, you should consider reviewing your VPN  concentrator redundancy as well. If you need two core based units for  this strategy, make sure that they also know how to fail over from one  to the other in time of need.<\/p>\n<p>Once you have considered your design after full analysis and  implemented it, you need to test it thoroughly and then document the  procedures. After that, you need to continue to test and update the  documentation especially as new technologies are added to your  architecture, or as your architecture grows such as adding new sites.<\/p>\n<p>Now that we have talked about the importance of redundancy, especially in the network\u2026 you need to test it.<\/p>\n<p><strong>Test Plan<\/strong> \u2013 put together a plan that covers the  details of a failover scenario so you can test for it. To do this,  detail all of the architecture to be tested and failover manually to see  how the solution works. Make sure you test applications, routing paths,  time\/speed\/bandwidth usage and accuracy. Test for fail-back as well and  how the main link when it becomes available can resume the role of the  primary link.<\/p>\n<p><strong>Network Monitoring<\/strong> \u2013 your network monitoring solution should  help you become aware of a main link failure. Using technologies such as  ICMP and SNMP, you can continuously monitor your uptime and be alerted  when there is a change in any device, link or solution.<\/p>\n<p><strong>Disaster Recovery Plan<\/strong> \u2013 a plan should in place, and if not, added immediately. This plan should outline the<\/p>\n<p>Business Continuity Planning and Incident Response Planning are part  of this DR plan. How the business will continue to operate, as well as  who will react to the problem and follow it through until normal  operations are restored.<\/p>\n<h2>Summary<\/h2>\n<p>In this article, we looked at adding redundancy into your network. To  keep the company\u2019s workforces, and their customers connected and  operating, we need to plan for it. With servers you can cluster and with  web properties and applications, you can load balance. Network gear can  be made highly available. Almost every vendor today puts out a line of  products to facilitate this need via hardware, software and now via  virtualization design.\u00a0 In this article we took take an in depth view of  current technology and strategies used to create redundancy in your WAN  and how to properly design, implement, monitor and test in case of any  disaster that may occur.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>An in depth view of current technology and strategies used to create redundancy in your WAN and how to properly design, implement, monitor and test in case of any disaster that may occur as well as covering briefly other redundancy options. Introduction Now more than ever, today\u2019s businesses require reliable network connectivity and access to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,1],"tags":[],"class_list":["post-815","post","type-post","status-publish","format-standard","hentry","category-computer-tech-stuff","category-networking-stuff"],"_links":{"self":[{"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=\/wp\/v2\/posts\/815","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=815"}],"version-history":[{"count":2,"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=\/wp\/v2\/posts\/815\/revisions"}],"predecessor-version":[{"id":817,"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=\/wp\/v2\/posts\/815\/revisions\/817"}],"wp:attachment":[{"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=815"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=815"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/microsoftgeek.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=815"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}