Ciscoman's notes (Записки цыщика c дипломом)

I'm Cisco Champion Community member for 2017!

I'm Cisco Champion Community member for 2017!
"Cisco Champions are passionate about Cisco and happy to share our knowledge, experience, and feedback."

вторник, 29 марта 2011 г.

ECMP voodo magic


Hi! Today I would like to talk about ECMP in Cisco IOS. As you probably know there are two different methods – per-packet and per-destination load balancing. http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094820.shtml
Default method for IOS is to use per-destination load balancing with CEF. It is possible also to configure per-packet load balancing with CEF, or you can disable CEF and fast-switching to do per-packet.
Let’s look at the following simple topology:

Figure 1. ECMP topology


There is no really nothing unclear with except of two facts. On R1 ECMP is working.  R5 prefer R3 as the default route.   Routing protocol used here has no difference to achieve this configuration.  You can use static routing for simplicity.
Let’s look at the Router2 cef table for prefix 196.254.1.1:

Router2#sh ip cef 196.254.1.1/32 detail
196.254.1.1/32, epoch 0
  NetFlow: Origin AS 0, Peer AS 0, Mask Bits 32
  nexthop 9.8.7.217 GigabitEthernet0/1.996

Then let’s look at the traceroute (1):

Router2#traceroute 196.254.1.1 numeric source l0

Type escape sequence to abort.
Tracing the route to 196.254.1.1

  1 9.8.7.217 0 msec 4 msec 0 msec
  2 172.29.94.83 4 msec 0 msec 4 msec
  3 196.254.1.1 4 msec 0 msec 0 msec

As we expected it, it is per-destination and all three packets gone through 172.29.94.83 gateway, although on Router2 equal cost multipath load sharing is working.  Let’s look at the Router1 CEF table for prefix 196.254.1.1/32 and confirm this:

Router1#sh ip cef 196.254.1.1/32 detail
196.254.1.1/32, epoch 0, per-destination sharing
  NetFlow: Origin AS 0, Peer AS 0, Mask Bits 32
  nexthop 172.29.94.83 GigabitEthernet0/1.324
  nexthop 172.29.94.84 GigabitEthernet0/1.324

Let’s look at the traceroute results now (2):

Router1#traceroute 196.254.1.1 numeric source l0
Type escape sequence to abort.
Tracing the route to 196.254.1.1

  1 172.29.94.84 0 msec
    172.29.94.83 4 msec
    172.29.94.84 4 msec
  2 196.254.1.1 0 msec 0 msec 4 msec
Router1#

Do you see this strange behavior? Nothing strange really except it is per-packet, not per-destination as we was expecting.
Let’s recheck it again with extended ping command on Router2 (3):

Router2#ping
Protocol [ip]:
Target IP address: 196.254.1.1
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: loopback0
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]: yes
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: Record
Number of hops [ 9 ]:
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 196.254.1.1, timeout is 2 seconds:
Packet sent with a source address of 172.22.255.100
Reply data will be validated
Packet has IP options:  Total option bytes= 39, padded length=40
 Record route: <*>
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)

Reply to request 0 (4 ms).  Received packet has options
 Total option bytes= 40, padded length=40
 Record route:
   (9.8.7.218)
   (172.29.94.81)
   (172.27.175.1)
   (196.254.1.1)
   (196.254.1.1)
   (172.29.94.83)
   (9.8.7.217)
   (172.22.255.100) <*>
   (0.0.0.0)
 End of list

Reply to request 1 (4 ms).  Received packet has options
 Total option bytes= 40, padded length=40
 Record route:
   (9.8.7.218)
   (172.29.94.81)
   (172.27.175.2)
   (196.254.1.1)
   (196.254.1.1)
   (172.29.94.83)
   (9.8.7.217)
   (172.22.255.100) <*>
   (0.0.0.0)
 End of list

Reply to request 2 (4 ms).  Received packet has options
 Total option bytes= 40, padded length=40
 Record route:
   (9.8.7.218)
   (172.29.94.81)
   (172.27.175.1)
   (196.254.1.1)
   (196.254.1.1)
   (172.29.94.83)
   (9.8.7.217)
   (172.22.255.100) <*>
   (0.0.0.0)
 End of list

Reply to request 3 (4 ms).  Received packet has options
 Total option bytes= 40, padded length=40
 Record route:
   (9.8.7.218)
   (172.29.94.81)
   (172.27.175.2)
   (196.254.1.1)
   (196.254.1.1)
   (172.29.94.83)
   (9.8.7.217)
   (172.22.255.100) <*>
   (0.0.0.0)
 End of list

Reply to request 4 (4 ms).  Received packet has options
 Total option bytes= 40, padded length=40
 Record route:
   (9.8.7.218)
   (172.29.94.81)
   (172.27.175.1)
   (196.254.1.1)
   (196.254.1.1)
   (172.29.94.83)
   (9.8.7.217)
   (172.22.255.100) <*>
   (0.0.0.0)
 End of list

Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms

Now it is again very confusing! It is per-packet load sharing again. Packets go through 172.27.175.1 and 172.27.175.2 in round-robin fashion! What’s wrong with it?
As I already mentioned, nothing strange there. Answer is simple – packet destined to device itself is processed by control plane of the router and control plane packet is process switched on CPU on IOS. If you know this fact you can understand this behavior easily. In (1) packets processed through router are simply managed by data plane, TTL exceeded is sent.  In (2) and (3) packets processed by R1 control plane. That’s all. There is no cisco voodoo magic.

4 комментария:

  1. Hi there!

    Strictly speaking, CEF performs per-*flow* load-sharing technique rather than per-destination. SA/DA pair is used to calculate the hash value pointing to the appropriate loadinfo's bucket. Sometimes this hash value is also seeded in order to avoid the CEF-polarization effect.

    On h/w-assisted platforms (e.g. EARL-based) you can also instruct IOS to take L4-info into accnt ("mls ip cef load-sharing full").

    HTH.

    ОтветитьУдалить
  2. uri, can you point me to the documentation?

    ОтветитьУдалить
  3. Sure I can. Take a look-see at:
    http://www.networkers-online.com/blog/2009/04/cef-and-load-sharing/
    .. or get Russ White's book which is named "Cisco Express Forwarding" from CiscoPress. It's great. Hardly could you find out the better source on this topic.

    HTH and good luck w/ your CCIE pursuit :)

    ОтветитьУдалить
  4. Thank you, Uri! Glad to see somebody interested in my blog posts here.

    ОтветитьУдалить

Постоянные читатели

Поиск по этому блогу