Hi! Today I would like to talk about ECMP in Cisco IOS. As you probably know there are two different methods – per-packet and per-destination load balancing. http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094820.shtml
Default method for IOS is to use per-destination load balancing with CEF. It is possible also to configure per-packet load balancing with CEF, or you can disable CEF and fast-switching to do per-packet.
Let’s look at the following simple topology:
Figure 1. ECMP topology |
There is no really nothing unclear with except of two facts. On R1 ECMP is working. R5 prefer R3 as the default route. Routing protocol used here has no difference to achieve this configuration. You can use static routing for simplicity.
Let’s look at the Router2 cef table for prefix 196.254.1.1:
Router2#sh ip cef 196.254.1.1/32 detail
196.254.1.1/32, epoch 0
NetFlow: Origin AS 0, Peer AS 0, Mask Bits 32
nexthop 9.8.7.217 GigabitEthernet0/1.996
Then let’s look at the traceroute (1):
Router2#traceroute 196.254.1.1 numeric source l0
Type escape sequence to abort.
Tracing the route to 196.254.1.1
1 9.8.7.217 0 msec 4 msec 0 msec
2 172.29.94.83 4 msec 0 msec 4 msec
3 196.254.1.1 4 msec 0 msec 0 msec
As we expected it, it is per-destination and all three packets gone through 172.29.94.83 gateway, although on Router2 equal cost multipath load sharing is working. Let’s look at the Router1 CEF table for prefix 196.254.1.1/32 and confirm this:
Router1#sh ip cef 196.254.1.1/32 detail
196.254.1.1/32, epoch 0, per-destination sharing
NetFlow: Origin AS 0, Peer AS 0, Mask Bits 32
nexthop 172.29.94.83 GigabitEthernet0/1.324
nexthop 172.29.94.84 GigabitEthernet0/1.324
Let’s look at the traceroute results now (2):
Router1#traceroute 196.254.1.1 numeric source l0
Type escape sequence to abort.
Tracing the route to 196.254.1.1
1 172.29.94.84 0 msec
172.29.94.83 4 msec
172.29.94.84 4 msec
2 196.254.1.1 0 msec 0 msec 4 msec
Router1#
Do you see this strange behavior? Nothing strange really except it is per-packet, not per-destination as we was expecting.
Let’s recheck it again with extended ping command on Router2 (3):
Router2#ping
Protocol [ip]:
Target IP address: 196.254.1.1
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: loopback0
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]: yes
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: Record
Number of hops [ 9 ]:
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 196.254.1.1, timeout is 2 seconds:
Packet sent with a source address of 172.22.255.100
Reply data will be validated
Packet has IP options: Total option bytes= 39, padded length=40
Record route: <*>
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
Reply to request 0 (4 ms). Received packet has options
Total option bytes= 40, padded length=40
Record route:
(9.8.7.218)
(172.29.94.81)
(172.27.175.1)
(196.254.1.1)
(196.254.1.1)
(172.29.94.83)
(9.8.7.217)
(172.22.255.100) <*>
(0.0.0.0)
End of list
Reply to request 1 (4 ms). Received packet has options
Total option bytes= 40, padded length=40
Record route:
(9.8.7.218)
(172.29.94.81)
(172.27.175.2)
(196.254.1.1)
(196.254.1.1)
(172.29.94.83)
(9.8.7.217)
(172.22.255.100) <*>
(0.0.0.0)
End of list
Reply to request 2 (4 ms). Received packet has options
Total option bytes= 40, padded length=40
Record route:
(9.8.7.218)
(172.29.94.81)
(172.27.175.1)
(196.254.1.1)
(196.254.1.1)
(172.29.94.83)
(9.8.7.217)
(172.22.255.100) <*>
(0.0.0.0)
End of list
Reply to request 3 (4 ms). Received packet has options
Total option bytes= 40, padded length=40
Record route:
(9.8.7.218)
(172.29.94.81)
(172.27.175.2)
(196.254.1.1)
(196.254.1.1)
(172.29.94.83)
(9.8.7.217)
(172.22.255.100) <*>
(0.0.0.0)
End of list
Reply to request 4 (4 ms). Received packet has options
Total option bytes= 40, padded length=40
Record route:
(9.8.7.218)
(172.29.94.81)
(172.27.175.1)
(196.254.1.1)
(196.254.1.1)
(172.29.94.83)
(9.8.7.217)
(172.22.255.100) <*>
(0.0.0.0)
End of list
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms
Now it is again very confusing! It is per-packet load sharing again. Packets go through 172.27.175.1 and 172.27.175.2 in round-robin fashion! What’s wrong with it?
As I already mentioned, nothing strange there. Answer is simple – packet destined to device itself is processed by control plane of the router and control plane packet is process switched on CPU on IOS. If you know this fact you can understand this behavior easily. In (1) packets processed through router are simply managed by data plane, TTL exceeded is sent. In (2) and (3) packets processed by R1 control plane. That’s all. There is no cisco voodoo magic.