-
Notifications
You must be signed in to change notification settings - Fork 480
SAI Ambiguities
The original SAI specification lacks a number of important details, describing both the expected behavior of the dataplane and the expected behavior of the API itself. While trying to implement a formal dataplane model of SAI, we came up with a number of questions and this page contains the current interpretations, derived in the discussions with SAI authors (Microsoft).
- The assumption is that the dataplane automatically recognizes the standard value of 0x8100 as TPID. This is the one and only value -- all others (e.g. 0x9100, 0x88a8) are not recognized.
- MSFT to think about it
-
Yes, the packets are subject to the standard IEEE checks and must be dropped if one of them fails:
- Ethernet packets with MACSA[40]==1 (multicast source address) are not allowed
- Ethernet packets with MACSA==00:00:00:00:00:00 are not allowed
- Ethernet packets with MACDA==00:00:00:00:00:00 are not allowed
- Ethernet packets with MACSA == MACDA are not allowed
-
For processing beyond L2, SAI switch witch only accepts packets with EthernetII encapsulation. Ethernet packets with LLC/SNAP encapsulation (e.g IP packets with ProtocolID=0x0800) are not recognized as such and processed as regular L2 packets. This might be changed later.
The SAI API must prevent this from happening, by not allowing ports, for which a port-based interface is created, from being members of a VLAN and participating in L2 switching. Similarly, port-based interface cannot be created on a port that is currently a member of a VLAN.
If a port-based router interface is created on a port, that port can only accept untagged packets.
- Default VLAN can (and should) still be set on a port.
- MSFT to think about priority-tagged packets case (they might be still allowed)
- MSFT to think about which packets are acceptable and consult with other participants
- The router can only process IPv4 and IPv6 packets and must drop all others
- Any other ingress packet of interest to L3 stack, e.g. ARP, must be trapped before it reaches the router
- Creation of a VLAN-based router interface should not affect L2 switching functionality. All non-L3 packets (that are not trapped) should go through normal L2 switching process
If neither SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS, nor SAI_SWITCH_ATTR_SRC_MAC_ADDRESS are set, does it mean that ALL packets ingressing on a given port/VLAN will go to L3 processing, or none?
- SAI Implementation must ensure that SAI_SWITCH_ATTR_SRC_MAC_ADDRESS is always set. Otherwise, routing cannot be enabled.
Is it OK to create many router interfaces (on the same port or VLAN) with different values of SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS?
- Yes. This is allowed.
Note: While the attribute name is SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS, the router interface will be chosen when the packet's Destination MAC address is equal to SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS (or SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS). This might be a little confusing.
- MSFT will think about separating the functionality into Ingress and Egress Routing Interfaces
What kind of MAC addresses are allowed for SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRES and SAI_SWITCH_ATTR_SRC_MAC_ADDRESS?
- Currently only unicast addresses are allowed.
- The same value must be used for both ingress and egress checks.
What information do Router Interfaces provide on the egress, when they are used in the neighbor objects?
- Port-based Router Interfaces provide new Source MAC Address and Egress Port. Packets, egressing through these interface must not go through FDB lookup
- VLAN-based Router Interfaces provide new Source MAC address and VLAN ID for the egress packets
- Yes, it must exist at the time of Router Interface creation and cannot be removed without removing the router interface first. Attempts to remove a VLAN, for which a router interface exists should return SAI_STATUS_OBJECT_IN_USE error code. This rule generally follows the principle of SAI enforcing the configuration database consistency.
- Yes, it must exist at the time of Router Interface creation and cannot be removed without removing the router interface first. Attempts to remove a Virtual Router, referred to by a router interface, should return SAI_STATUS_OBJECT_IN_USE error code. This rule generally follows the principle of SAI enforcing the configuration database consistency.
- Both virtual router and router interface must be administratively "Up" in order for routing to work
- All IP lookups (including host lookups with /32 and /128 prefixes) happen in the Route Table. Route Table maps the tuple {Virtual Router ID, Destination IP Address, Mask} to the Nexthop ID
- Nexthop ID resolves into {IP address, Router interface}
- {IP address, Router Interface} is then looked up in the neighbor table, which provides new Destination MAC Address
- For Port-based Router Interfaces, the packet is sent out of the port, associated with the Router Interface
- For VLAN-Based Router Interfaces, Destination MAC Addres is looked up in FDB, using the VLAN, associated with the Router Interface.
- No URPF checks are currently performed
The confusion comes from the description of the attribute SAI_NEIGHBOR_ATTR_NO_HOST_ROUTE:
Neighbor not to be programmed as a host route entry in ASIC and to be only used to setup next-hop purpose. Typical use-case is to set this true for neighbor with IPv6 link-local addresses.
Since the default value for SAI_NEIGHBOR_ATTR_NO_HOST_ROUTE is FALSE, the question is "What does it mean?"
-
MSFT should define the behavior
- Looking up in both Neighbor table and Route table creates a number of other problems a. In case of a Route Table hit, the Neighbor table will have to be looked up again, which is very difficult to do on most hardware implementations b. Entries in the Neighbor table are looked up using Router Interface ID, whereas entries in the Route Table are looked up using Virtual Router Interface ID, therefore creating a major inconsistency
- Adding an entry into the routing table and into the Nexthop table as a convenience looks more plausible. We just need to add some details, in terms of how the route and nexthop attributes are defined
-
If there is no FDB entry corresponding to DMAC value from neighbor look-up, what is the expected behavior. Particularly in case when the pipeline can flood if the DMAC look-up fails. Should it drop the packet or trap it.
The current proposal is fairly vague on details. For example:
- Can a router interface be created for a LAG?
- Which port properties are defined for a LAG?
- MSFT to clarify
No, a port can not be added to a LAG more than once. If the program attempts to do that, the API must return SAI_STATUS_ITEM_ALREADY_EXISTS
No, a port can not be a member of more than one LAG. If the program attempts to do that, the API must return SAI_STATUS_OBJECT_IN_USE
Router interfaces can be created on a LAG and will be looked up based on the ingress LAG.
- MSFT to define behavior
- Proposal: No, this is not allowed
There are two main options:
- Corresponding VLAN hasn’t been created
- VLAN ID is simply out of range. In that case, a number of possibilities exist:
- Is the lowest allowable value 0 or 1?
- Is the highest allowable value 4094 or 4095?
- MSFT to think about it
The problem is that typically these states are defined by specifying device behavior for "normal" versus BPDU packets. This requires SAI to adopt a formal definition of BPDU, which it currently lacks.
- MSFT to think about it. 01:80:C2:xx:xx:xx ? Something else (01:00:0C:CC:CC:CD)?
- Make sense to add DISABLED state (both BPDUs and normal packets are dropped)
Priority 0 is lowest
Is sai_create_hostif() simply responsible of associating the existing network interface (as represented by SAI_HOSTIF_ATTR_NAME) with a given port or router interface or is it also responsible for creating this interface in the first place?
The specification is not clear on what exactly happens when the packet is sent to a netdev, associated with a router interface. While for the port-based netdevs, the expectation is that the packet will be sent out of a corresponding port, it is not obvious what will happen in case of RIF-based netdev (or even what kinds of packets will be accepted for Tx in the fist place).