Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arp Refresh changes #1548

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sumanbrcm
Copy link
Contributor

What I did
This is the change to arp refresh , details are provided in below sections .
Why I did it
SONiC depends upon the Linux kernel to manage the ARP/ND tables. SONiC then listens to ARP/ND events from the kernel and synchronizes the hardware as required. However, there are a number of problems with this: -

The kernel does not "see" the routed (in HW) through-traffic, and so cannot update its "hit bits" accordingly. Therefore the kernel may age out an entry that is still in use.
The kernel also does not "see" the HW MAC aging process, and so does not know that a MAC address associated with an ARP/ND entry has been aged out, and so does not refresh it. This can result in traffic black holes for a "quiet" neighbor (i.e. one that does not transmit much).
There is a further problem in MCLAG/ICCP setups whereby the response to an ARP/ND initiated by the kernel on one peer can go to the other peer. This eventually makes its way back across the ICCP control plane, but by then the kernel may have already aged out the entry.
The current ARP Refresh process is implemented as a bash script, and cannot run fast enough to be effective at scale, requiring the network operator to set much higher aging timers than would otherwise be used. It's also a very inefficient use of system resources. So, the proposal here is to design and implement a much faster and more efficient instance of the ARP Refresh process.

How I verified it

  1. Verifications are done by adding arp entries dynamically and tcpdump verifications was done to check if arp request/reply are observed in accordance with the proposed design . Here are the details test logs .
    a. For arp (3 updates for 12.12.12.2 are shown in logs below , other arps/ more logs are not updated here)
    admin@sonic:~$ show arp
    Address MacAddress Iface Vlan

10.59.128.1 00:00:0c:9f:f4:68 eth0 -
12.12.12.2 00:10:94:00:00:05 Ethernet0 -
12.12.12.3 00:10:94:00:00:06 Ethernet0 -
12.12.12.4 00:10:94:00:00:07 Ethernet0 -
12.12.12.5 00:10:94:00:00:08 Ethernet0 -
Total number of entries 5

admin@sonic:~$ sudo tcpdump -ei Ethernet0
19:27:40.364309 3c:2c:99:2d:84:35 (oui Unknown) > 00:10:94:00:00:05 (oui Unknown), ethertype ARP (0x0806), length 42: Request who-has 12.12.12.2 tell 12.12.12.1, length 28
19:27:40.364666 00:10:94:00:00:05 (oui Unknown) > 3c:2c:99:2d:84:35 (oui Unknown), ethertype ARP (0x0806), length 60: Reply 12.12.12.2 is-at 00:10:94:00:00:05 (oui Unknown), length 46

19:32:40.397044 3c:2c:99:2d:84:35 (oui Unknown) > 00:10:94:00:00:05 (oui Unknown), ethertype ARP (0x0806), length 42: Request who-has 12.12.12.2 tell 12.12.12.1, length 28
19:32:40.397380 00:10:94:00:00:05 (oui Unknown) > 3c:2c:99:2d:84:35 (oui Unknown), ethertype ARP (0x0806), length 60: Reply 12.12.12.2 is-at 00:10:94:00:00:05 (oui Unknown), length 46

19:37:40.428211 3c:2c:99:2d:84:35 (oui Unknown) > 00:10:94:00:00:05 (oui Unknown), ethertype ARP (0x0806), length 42: Request who-has 12.12.12.2 tell 12.12.12.1, length 28
19:37:40.428622 00:10:94:00:00:05 (oui Unknown) > 3c:2c:99:2d:84:35 (oui Unknown), ethertype ARP (0x0806), length 60: Reply 12.12.12.2 is-at 00:10:94:00:00:05 (oui Unknown), length 46

admin@sonic:~$ sudo tcpdump -ei Ethernet0

b. For ndp
(3 updates for 2100::2 are shown in logs below , other ndps/ more logs are not updated here)
admin@sonic:~$ show ndp | head
Address MacAddress Iface Vlan Status


2100::2 00:10:94:00:00:09 Ethernet0 - REACHABLE
2100::3 00:10:94:00:00:0a Ethernet0 - REACHABLE
2100::4 00:10:94:00:00:0b Ethernet0 - REACHABLE
2100::5 00:10:94:00:00:0c Ethernet0 - REACHABLE
fe80::1a5a:58ff:fe17:c2e0 18:5a:58:17:c2:e0 eth0 - STALE
fe80::1a5a:58ff:fe18:f720 18:5a:58:18:f7:20 eth0 - STALE
fe80::1a5a:58ff:fe19:620 18:5a:58:19:06:20 eth0 - STALE
fe80::3e2c:99ff:fe2d:8735 3c:2c:99:2d:87:35 eth0 - STALE

11:55:46.283420 3c:2c:99:2d:84:35 (oui Unknown) > 33:33:ff:00:00:02 (oui Unknown), ethertype IPv6 (0x86dd), length 86: fe80::3e2c:99ff:fe2d:8435 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2100::2, length 32
11:55:46.283763 00:10:94:00:00:09 (oui Unknown) > 3c:2c:99:2d:84:35 (oui Unknown), ethertype IPv6 (0x86dd), length 86: 2100::2 > fe80::3e2c:99ff:fe2d:8435: ICMP6, neighbor advertisement, tgt is 2100::2, length 32

12:00:46.314416 3c:2c:99:2d:84:35 (oui Unknown) > 33:33:ff:00:00:02 (oui Unknown), ethertype IPv6 (0x86dd), length 86: fe80::3e2c:99ff:fe2d:8435 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2100::2, length 32
12:00:46.314820 00:10:94:00:00:09 (oui Unknown) > 3c:2c:99:2d:84:35 (oui Unknown), ethertype IPv6 (0x86dd), length 86: 2100::2 > fe80::3e2c:99ff:fe2d:8435: ICMP6, neighbor advertisement, tgt is 2100::2, length 32

12:06:46.350847 3c:2c:99:2d:84:35 (oui Unknown) > 33:33:ff:00:00:02 (oui Unknown), ethertype IPv6 (0x86dd), length 86: fe80::3e2c:99ff:fe2d:8435 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2100::2, length 32
12:06:46.351333 00:10:94:00:00:09 (oui Unknown) > 3c:2c:99:2d:84:35 (oui Unknown), ethertype IPv6 (0x86dd), length 86: 2100::2 > fe80::3e2c:99ff:fe2d:8435: ICMP6, neighbor advertisement, tgt is 2100::2, length 32

  1. More compliant test results will be updated
    Details if related

ARP Refresh Thread:

ARP refresh functionality is added to neighsyncd process.

Neighsyncd is responsible for syncing the kernel ARP table to the hardware via the APP_DB and OrchAgents. Neighsyncd listens on netlink events (RTM_NEWNEIGH, RTM_DELNEIGH) and creates/deletes NEIGH_TABLE entries in APP_DB.

Existing functionality of neighsyncd is retained as it is. In addition to managing NEIGH_TABLE entries in APP_DB, neighsyncd will also add the details of the neighbor into a queue towards the new ARP Refresh thread described below.

A new ARP refresh thread is created in neighsyncd: -
to dequeue the neighbor events and populate a neighbor cache.
to periodically refresh ARP/ND by sending ARP request pkt / NS pkt
to subscribe to redis-db to gather the data required to send the ARP refresh packets.

Following are the different modules in the ARP refresh thread.

Neighbor Cache Management
Add neighbor entries to cache when the entry is learned from the kernel
All Dynamically learned neighbor entries [ARP, ND (Global, LinkLocal)]
All Static neighbor entries (MAC can be dynamic)
Below entries will not be added to the neighbor cache
Neighbors learned from “eth0” interface
Neighbors learned from BGP/EVPN MAC/IP type-2 route
MYIPaddress entries /// FF:FF:FF:FF:FF:FF Permanent entries
Remove entries from cache when the entry is deleted from Kernel
v4/v6 Neighbors Cache [map] contents are: -
Key = IP Address + InterfaceName [Phy/PortChannel/Vlan/Sag]
Value
MAC Address
State (Reachable/Failed)
Timestamp (Entry creation/last refresh)

Interface Cache Management
Required for framing the ARP packets we send
Interface Cache [Map]
Key = Interface name
Value = IP, MAC, Ifname to Index
Subscribe to redis-db tables
IP address
- CONFIG_DB: INTERFACE, VLAN_INTERFACE, SAG_INTERFACE
MAC
- CONFIG_DB: DEVICE_METADATA ==> System MAC
- CONFIG_DB: SAG_GLOBAL
Ifname to Index (required for socket send)

Packet Builder
Based on Neighbor Cache
Build ARP packet
Build NS packet
For Resolved ARP Dst MAC, the ARP request is unicast
For Unresolved ARP, Dst MAC the ARP request is broadcast
IPv6 NS uses multicast

Send Refresh
Send ARP/NS packets using raw socket
Separate sockets for ARP and ICMPv6 NS
Send Unicast packet
VLAN tagging & FDB lookup happens in kernel based on outgoing interface

Refresh Timer
Traverse the neighbor Cache entries periodically (every 30 secs)
Check refresh timeout has elapsed for every neighbors
If elapsed then send ARP/NS packet

Refresh timeout Calculation:

To avoid sending all ARP/NS packets simultaneously, each neighbor entry will be configured with different refresh timeout value. This refresh timeout value will be based on MAC/ARP/NS aging time.

ARP Reference Timeout (ARP_RT) = Lesser of [MAC age, ARP age]
ND Reference Timeout (ND_RT) = Lesser of [MAC age, ND age]

Refresh Timeout = 30% to 70% of [ARP/ND Reference Timeout]

For example:

MAC Age is less than ARP age
ARP ageout = 60 mins
MAC ageout = 30 mins
Reference Timeout = 30 mins.
Refresh timeout will be between 30% to 70% of reference timeout (9 to 21) mins.

ARP Age is less than MAC age
ARP ageout = 60 mins
MAC ageout = 90 mins
Reference Timeout = 60 mins
Refresh timeout will be between 30% to 70% of reference timeout (18 to 42) mins.

Refresh timeout will be set whenever the neighbor entry is added/updated in cache, it will also be recomputed after sending the ARP/NS refresh packet.

Recommended Configurations:

ARP scale MAC aging timer (min) ARP Aging timer (min)
2000 10 (default) 30 ( default)
4000 10 30 (default)
6000 20 60
24K 40 60
32K 60 90

@prsunny
Copy link
Collaborator

prsunny commented Feb 10, 2021

This was discussed offline with BCM and agreed on the approach to move the refresh mechanism to neighbor manager. Awaiting PR update

@prsunny prsunny self-requested a review as a code owner September 2, 2022 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants