From: Marko Milivojevic (markom@markom.info)
Date: Mon Oct 27 2008 - 13:18:07 ARST
On Mon, Oct 27, 2008 at 14:55, Han Solo <emaillists@me.com> wrote:
> Interesting ... I think a combination of the 2 is in order ... For anytype
> of HA you should have both system redundancy but also build the network with
> dual uplinks etc etc etc ... My big question was really how often do folks
> with say 12.2(18)SXF8 and above with SSO etc have issue's where they find
> there Core switches setup with tested , properly configugred HA for there
> core 6500's have partial failures in which SSO did not fully failover and is
> hung somewhere in the middle causing all forms of HA to be lost , I had a
> situation where the x2 Core 6500's running 12.2(18)SXF13 with dual
> SUP720-3BXL's , SSO , all fabric enabled 67XX line cards , redundant mode
> 6000 WATT PSU's , the works .. We also quartly do manual failover tests to
> remove the "Change Control" factor during a quarter period of time , but one
> day we had an issue where the Active SUP on CORE1 failed in such a manner
> that the STANDBY sup did not fully take over and litterly they were flapping
> back and forth , I was curious if others have ran into this , as we are now
> in the process of having to validate our design which provides both system
> level and network level redundancy , so I have the arguement that even
> though I have a fault redundant network meaning physical links , routing
> protocols , features etc , if the system fails to failover properly then
> this is a moot point...
That's the whole point. If you have redundancy, you need to make sure
that it actually works. If it doesn't you have wasted both time and
money.
With redundant nodes, they work in 99% of the cases, but that one when
something barely fails is always a problem (perfect example being
flapping link). There is no cheap and easy way to address that
problem, other than to be aware of it. In my experience, it is always
better to have a device fail, than to have it failing.
I would like to raise another issue and that is complexity. While it's
quite sexy to say that you have fully redundant network, all nodes
have dual sup's, power supplies, blah blah blah, it adds incredible
complexity to the network. This is something that also needs careful
consideration and management and blindly adding "redundancy" can
actually have very adverse effect on the availability of the network.
With all that in mind, if you look into design guidelines for core and
distribution, you will see that redundant sup's are not really
advertised heavily, even by Cisco. They are proponents of providing
network-level redundancy for those layers of network. With all modern
features available to network designer, like MPLS-TE based FastReroute
(MPLS features are just fine in Enterprise core, not only SP), link
and node protection, having nodes fully redundant is unnecessary cost
and complexity, better done without.
Access layer is a different thing. Unless you can provide dual
connections to every work station and then in scalable way solve all
the misery with operating system drivers, broken NIC's and all sorts
of PEBKAC issues, network redundancy is not an option. This is the
layer where you want to have redundancy, yet... this is where we least
have it available. The first real redundancy is available in 4510-R
(or was there 4507-R?) - smaller switches support none if it (3750
stacking adds capacity, but redundancy for ports on the failed switch
is still absent).
Just my 2c.
-- Marko CCIE #18427 (SP) My network blog: http://cisco.markom.info/Blogs and organic groups at http://www.ccie.net
This archive was generated by hypermail 2.1.4 : Sat Nov 01 2008 - 15:35:22 ARST