Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP PIC HLD #1493

Open
wants to merge 49 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
c7057b4
BGP PIC HLD initial draft
eddieruan-alibaba Oct 9, 2023
fb564ed
Clean up
eddieruan-alibaba Oct 9, 2023
1aa8479
Add ToC and unit test
eddieruan-alibaba Oct 9, 2023
278ef51
Update based on some review comments
eddieruan-alibaba Oct 16, 2023
09e41ba
Add a section explain BGP PIC's idea
eddieruan-alibaba Oct 22, 2023
ca32e7c
Fix figure numbers
eddieruan-alibaba Oct 22, 2023
d285dd1
Add a writeup for how pic_nhg gets used in BGP convergence handling
eddieruan-alibaba Oct 23, 2023
4a5aa51
Fix a typo
eddieruan-alibaba Oct 24, 2023
4e147ea
Add some sample codes to show how pic_nhe gets created
eddieruan-alibaba Oct 30, 2023
49b5a66
Add a sample output
eddieruan-alibaba Oct 31, 2023
7ff20dc
Some misc updates for some clarifications
eddieruan-alibaba Nov 6, 2023
5c152fe
add kernel handling section
eddieruan-alibaba Nov 9, 2023
86fab21
Merge branch 'sonic-net:master' into eruan-pic
eddieruan-alibaba Nov 10, 2023
a4390ed
BGP PIC HLD initial draft
eddieruan-alibaba Oct 9, 2023
541afa6
Clean up
eddieruan-alibaba Oct 9, 2023
606ea34
Add ToC and unit test
eddieruan-alibaba Oct 9, 2023
21c92fb
Update based on some review comments
eddieruan-alibaba Oct 16, 2023
63c09ee
Add a section explain BGP PIC's idea
eddieruan-alibaba Oct 22, 2023
45118bd
Fix figure numbers
eddieruan-alibaba Oct 22, 2023
05042e1
Add a writeup for how pic_nhg gets used in BGP convergence handling
eddieruan-alibaba Oct 23, 2023
3d0b6f0
Fix a typo
eddieruan-alibaba Oct 24, 2023
bddbb4c
Add some sample codes to show how pic_nhe gets created
eddieruan-alibaba Oct 30, 2023
eaf5c6d
Add a sample output
eddieruan-alibaba Oct 31, 2023
879a752
Some misc updates for some clarifications
eddieruan-alibaba Nov 6, 2023
a9ee7a5
add kernel handling section
eddieruan-alibaba Nov 9, 2023
298cfdf
Merge branch 'eruan-pic' of https://github.com/eddieruan-alibaba/SONi…
eddieruan-alibaba Nov 10, 2023
0f3fef9
update for test results
zice312963205 Dec 6, 2023
ffc383e
Update PIC test result section
eddieruan-alibaba Dec 7, 2023
9162952
reformat
eddieruan-alibaba Dec 8, 2023
2dbff75
Add recursive test result
eddieruan-alibaba Dec 8, 2023
3669c62
Merge branch 'sonic-net:master' into eruan-pic
eddieruan-alibaba Jan 22, 2024
449e131
Move from bgp_pic to pic folder
eddieruan-alibaba Jan 22, 2024
59781b6
Merge branch 'sonic-net:master' into eruan-pic
eddieruan-alibaba Feb 4, 2024
449cbd0
Eruan pic (#8)
xingrenwai Mar 15, 2024
2c6d5ac
Add an explanation for the overlay next hop refresh process.
zice312963205 Mar 19, 2024
1b1d515
Updated based on Lingyu's comments
eddieruan-alibaba Mar 19, 2024
cfa3f6b
clean up
eddieruan-alibaba Mar 20, 2024
62c3add
update underlay routing table of the examples (#9)
xingrenwai Mar 23, 2024
9a9c527
adjust the indent
xingrenwai Mar 23, 2024
abc85bc
update path_remove_pic png
zice312963205 Mar 23, 2024
4a1d7ad
Add files via upload
xingrenwai Mar 24, 2024
0cf10b9
Delete doc/pic/images_recursive/topotest1.png
xingrenwai Mar 24, 2024
7d124ff
Add files via upload
xingrenwai Mar 24, 2024
00d36a0
Merge branch 'sonic-net:master' into eruan-pic
xingrenwai Mar 29, 2024
5456478
Merge branch 'sonic-net:master' into eruan-pic
eddieruan-alibaba Apr 1, 2024
002bac8
Pull in Philippe's typo fix https://github.com/pguibert6WIND/SONiC/tr…
eddieruan-alibaba Apr 1, 2024
17cbfa0
Fix new lines
eddieruan-alibaba Apr 1, 2024
f8f8a85
Add the test topo diagrams
xingrenwai Apr 2, 2024
ac5ef7c
Fix some errors in diagrams
xingrenwai Apr 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions doc/bgp_pic/bgp_pic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
<!-- omit in toc -->
# BGP PIC HLD
<!-- omit in toc -->
### Revision
| Rev | Date | Author | Change Description |
|:---:|:-----------:|:----------------------:|-----------------------------------|
| 0.1 | Oct 8 2023 | Eddie Ruan / lingyu Zhang | Initial Draft |

<!-- omit in toc -->
## Table of Content
- [Goal \& Scope](#goal--scope)

## Goal and Scope
BGP PIC, as detailed in the RFC available at https://datatracker.ietf.org/doc/draft-ietf-rtgwg-bgp-pic/, addresses the enhancement of BGP route convergence. This document outlines a method to arrange forwarding structures that can lead to improved BGP route convergence. BGP PIC offers two primary enhancements:

1. It prevents BGP load balancing updates from being triggered by IGP load balancing updates. This issue, which is discussed in the SONiC Routing Working Group (https://lists.sonicfoundation.dev/g/sonic-wg-routing/files/SRv6%20use%20case%20-%20Routing%20WG.pptx), can be effectively resolved using BGP PIC.

<figure align=center>
<img src="images/srv6_igp2bgp.jpg" >
<figcaption>Figure 1. Alibaba issue Underlay routes flap affecting Overlay SRv6 routes <figcaption>
</figure>
Note: we only handle VPN overlay routes via BGP PIC. For global table's recursive routes handling, it would be handled via a seperate HLD and done by Accton team.

2. We aim to achieve fast convergence in the event of a hardware forwarding failure related to a remote BGP PE becoming unreachable. Convergence in the slow path forwarding mode is not a priority.

eddieruan-alibaba marked this conversation as resolved.
Show resolved Hide resolved
## High Level Design
One of the challenges in implementing PIC within FRR is the absence of PIC support in the Linux kernel. To minimize alterations in FRR while enabling PIC on platforms that do not require Linux kernel support for this feature, we are primarily focused on two key modifications:
1. In the 'zebra' component:
- Introduce a new Next Hop Group (PIC-NHG) specifically for the FORWARDING function. This NHG will serve as the shareable NHG in hardware.
eddieruan-alibaba marked this conversation as resolved.
Show resolved Hide resolved
- When a BGP next hop becomes unavailable, zebra will first update the new FORWARDING-ONLY NHG before BGP convergence takes place.
- If changes occur in the IGP NHG and these changes do not affect the reachability of individual members within the BGP NHG, there is no need to update the BGP NHG.
Copy link

@pguibert6WIND pguibert6WIND Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should specify what you mean by "these changes do not affect the reachability ...".
I have 2 use cases in mind:

  • UCMP - weighted extended community",
  • route-map configured that look for the IGP metric, will be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me update the wording. I mean the case BGP NH reachability would not be changed when IGP NHG updates. zebra needs to check BGP NH's reachability and skip BGP NHG update if all members's reachability is not changed.

Currently, the update is reported back to BGP and trigger BGP update even BGP NH's reachability is unchanged.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Eddie

- Zebra will transmit two new forwarding objects, BGP PIC context, and NHG, to orchagent via FPM. The handling of NHG is outlined in https://github.com/sonic-net/SONiC/pull/1425.
- Zebra will continue to update kernel routes in the same manner as before, as the kernel does not support BGP PIC.
eddieruan-alibaba marked this conversation as resolved.
Show resolved Hide resolved
2. In the orchagent component:
- Orchagent will be responsible for managing the two new forwarding objects, BGP PIC context, and NHG.

## Zebra's Data Structure Modifications
### Exiting Struct nexthop
The existing zebra nexthop structure encompasses both FORWARDING details and certain route CONTEXT information, such as VNI (Virtual Network Identifier) and srv6_nh, for various VPN (Virtual Private Network) functionalities. Given that route CONTEXT may vary among different VPN routes, it is not feasible for VPN routes to share the current VPN nexthop group generated by the zebra nexthop structure. When a remote BGP peer becomes inactive, zebra is required to update all linked VPN nexthop groups, potentially involving a significant number of VPN routes.

struct nexthop {

struct nexthop *next;
struct nexthop *prev;
/*
* What vrf is this nexthop associated with?
*/
vrf_id_t vrf_id;
/* Interface index. */
ifindex_t ifindex;

enum nexthop_types_t type;

uint16_t flags;

/* Nexthop address */
union {
union g_addr gate;
enum blackhole_type bh_type;
};
union g_addr src;

...

/* Encapsulation information. */
enum nh_encap_type nh_encap_type;
union {
vni_t vni;
} nh_encap;
/* EVPN router's MAC.
* Don't support multiple RMAC from the same VTEP yet, so it's not
* included in hash key.
*/
struct ethaddr rmac;
/* SR-TE color used for matching SR-TE policies */
uint32_t srte_color;
/* SRv6 information */
struct nexthop_srv6 *nh_srv6;
};

The forwarding objects in zebra are organized as the following mannar. Each struct nexthop contains forwarding only part and route context part. Due to route context parts are route attributes, they may be different for different routes. Therefore, struct nexthop_group may not be sharable.
<figure align=center>
<img src="images/zebra_fwding_obj_no_share.jpg" >
<figcaption>Figure 2. Existing Zebra forwarding objects <figcaption>
</figure>

### Updated data structure with BGP PIC changes
Instead of dividing the current 'struct nexthop' into two separate structures, we have opted to utilize the same nexthop structure to store both the route context part (also known as PIC CONTEXT) and the forwarding part.

Within the 'struct nhg_hash_entry,' we introduce a new field, 'struct nhg_hash_entry *pic_nhe.' This 'pic_nhe' is created when the original NHG pertains to BGP PIC. 'pic_nhe' points to an NHG that exclusively contains the original nexthop's forwarding part. The original nexthop retains both the PIC CONTEXT part and the forwarding part.

This approach allows us to achieve the following objectives:
- Utilize existing code for managing nexthop dependencies.
- Maintain a consistent approach for dplane to handle updates to the Linux kernel.

The new forwarding chain will be organized as follows.
<figure align=center>
<img src="images/zebra_fwding_obj_sharing.jpg" >
<figcaption>Figure 3. Zebra forwarding objects after enabling BGP PIC <figcaption>
</figure>

### struct nhg_hash_entry
As described in the previous section, we will add a new field struct nhg_hash_entry *pic_nhe in struct nhg_hash_entry.
If PIC NHE is not used, pic_nhe would be set to NULL.

### struct dplane_route_info
dplane_route_info is in struct zebra_dplane_ctx

We will add two new fields, zd_pic_nhg_id , zd_pic_ng. zd_pic_nhg_id is for pic_nhg's nh id, zd_pic_ng stores pic_nhg. These two new fields would be collected via dplane_ctx_route_init().

/* Nexthops */
uint32_t zd_nhg_id;
struct nexthop_group zd_ng;
/* PIC Nexthops */
uint32_t zd_pic_nhg_id;
struct nexthop_group zd_pic_ng;

These fields would be used in the following manner.

| Cases | Linux Kernel Update (slow path) | FPM (fast path) |
|:-----:|:------------------------------------:|:-----------------------------:|
| No BGP PIC enabled | zd_ng is used as NHG |
| BGP PIC enabled | zd_ng is used as NHG | zd_ng is used for PIC_CONTEXT, zd_pic_ng is used for NHG |

### struct dplane_neigh_info
This stucture is initialized via dplane_ctx_nexthop_init(), which is used to trigger NHG events. We don't need to make changes in this structure.

## Zebra Modifications
### BGP_PIC enable flag
BGP_PIC_enable flag would be set based on zebra's command line arguments. This flag would be set only on the platform which Linux kernel supports NHG, a.k.a kernel_nexthops_supported() returns true.

### Create pic_nhe
From dplane_nexthop_add(), when normal NHG is created, we will try to create PIC NHG as well.
zebra_nhe_find() is used to create or find a NHE. In create case, when NHE is for BGP PIC and BGP_PIC is enabled, we use the same same API to create a pic_nhe, a.ka. create nexthop with FORWARDING information only. The created pic_nhe would be stored in the new added field struct nhg_hash_entry *pic_nhe.

### Handles kernel forwarding objects
There is no change for zebra to handle kernel forwarding objects. Only zg_ng is used for NHG programming in kernel.

### Handles FPM forwarding objects
#### Map Zebra objects to APP_DB via FPM
When BGP_PIC is enabled, nhe's NHG would map to PIC_LIST, pic_nhe's NHG would map to forwarding NHG.
Route object would use nhe's id as context id and use pic_nhe's id as NHG id.

<figure align=center>
<img src="images/zebra_map_to_fpm_objs.jpg" >
<figcaption>Figure 4. Zebra maps forwarding objects to APP DB Objs when BGP PIC enables.<figcaption>
</figure>

#### SRv6 VPN SAI Objects
The following diagram shows SAI objects related to SRv6. The detail information could be found at
https://github.com/opencomputeproject/SAI/blob/master/doc/SAI-IPv6-Segment-Routing-VPN.md

<figure align=center>
<img src="images/srv6_sai_objs.png" >
<figcaption>Figure 5. SRv6 VPN SAI Objects<figcaption>
</figure>

#### Map APP_DB to SAI objects
<figure align=center>
<img src="images/app_db_to_sai.png" >
<figcaption>Figure 6. APP DB to SAI OBJs mapping<figcaption>
</figure>

### Orchagent Modifications
Handle two new forwarding objects from APP_DB, NEXTHOP_TABLE and PIC_CONTEXT_TABLE. Orchagent would map proper objects to the proper SAI objects.

## Zebra handles NHG member down events
### Local link down events
In Local link down event which would not triger BGP NH's reachability change case, we expect BGP NHG should not be updated.
TODO: Need to check with Kentaro to see if they would handle this event as the part of NHG handling.
eddieruan-alibaba marked this conversation as resolved.
Show resolved Hide resolved

### BGP NH down events
BGP NHG not reachable could be triggered from eBGP events or local link down events. We want zebra to backwalk all related BGP PIC NHG and update these NHG directly.
1. After a routing update occurs, update the processing result of this route in rib_process_result and call zebra_rib_evaluate_rn_nexthops.

2. In the zebra_rib_evaluate_rn_nexthops function, construct a pic_nhg_hash_entry based on rn->p and find the corresponding pic_nhg. Based on the dependents list stored in pic_nhg, find all other nexthop groups associated with the current nhg, and then remove the nexthop members in these nexthop groups.

3. Trigger a refresh of pic_nhg to fpm.

4. To ensure that nhg refresh messages can be triggered first, add prectxqueue in fpm_nl_ctx as a higher-priority downstream queue for fnc. When triggering nhg updates, attach the nhg's ctx to prectxqueue, and when refreshing fpm, prioritize getting ctx from prectxqueue for downstream.

As shown in the following image:

<figure align=center>
<img src="images/BGP_NH_update.png" >
<figcaption>Figure 7. BGP NH down event Handling<figcaption>
</figure>

When the route 2033::178, marked in blue, is deleted, find its corresponding nhg(68) based on 2033::178. Then, iterate through the Dependents list of nhg(68) and find the dependent nhg(67). Remove the nexthop member(2033::178) from nhg(67). After completing this action, trigger a refresh of nhg(67) to fpm.

Similarly, when the route 1000::178, marked in brown, is deleted, find its corresponding nhg(66). Based on the dependents list of nhg(66), find nhg(95) and remove the nexthop member(1000::178) from nhg(95). After completing this action, trigger a refresh of nhg(95) to fpm.

## Unit Test

## References
- https://datatracker.ietf.org/doc/draft-ietf-rtgwg-bgp-pic/
- https://github.com/opencomputeproject/SAI/blob/master/doc/SAI-IPv6-Segment-Routing-VPN.md
- https://github.com/sonic-net/SONiC/pull/1425


Binary file added doc/bgp_pic/images/BGP_NH_update.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/bgp_pic/images/app_db_to_sai.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/bgp_pic/images/srv6_igp2bgp.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/bgp_pic/images/srv6_sai_objs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/bgp_pic/images/zebra_fwding_obj_no_share.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/bgp_pic/images/zebra_fwding_obj_sharing.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/bgp_pic/images/zebra_map_to_fpm_objs.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.