-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial version of PINS P4Orch HLD #825
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
------- The P4Orch is a new orchagent which lives inside the OrchAgent process and picks up the entries to make the corresponding SAI API calls to program the switch hardware. ------
This may be old info ? P4OrchAgent will write the entries to ASIC DB. The demux does not happen here but after ASICDB in libsai?
All orchs make SAI API calls, but they don't directly program the hardware. Sairedis is the library that orchagent uses to make the SAI call, and it will write to ASIC DB. It is done in the sairedis library. Not sure what "demux" means. Does it mean to separate P4 requests from existing SONiC requests? If so, there is no demux after ASIC DB. We define new table in APPL DB, but the SAI API and ASIC DB are unchanged. |
|
||
## Restrictions/Limitations | ||
|
||
The P4Orch is designed to meet the SDN requirements. And hence there are some differences from the other SONiC orchagents: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also specify how the re-programming happens if lets say there is an swss crash/restart and APP_DB is cleared during init. Call out if there are any limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Updated the "Restrictions/Limitations" section that P4Orch does not support warmboot and orchagent restart in the initial phase.
For my understanding, orchagent restart is similar to warmboot. In both case, the APPL DB tables won't be cleared. And orchagent will re-program the APPL DB tables in "init view" mode. This is not specific to P4Orch. It is a common feature for all orchs. Our main challenge for supporting warmboot now is that P4Orch doesn't do retry. During warmboot init, all table requests will come in a single batch. Since P4Orch doesn't do retry, dependency on SONiC table (such as port & vrf) can not be satisfied. We do not support warmboot in the initial phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orchagent restart is not similar to warmboot. APP_DB shall be flushed during init. In regular cases, the configs are reprogrammed by managers. In case of p4rt, does this leave the system in such a state? What actions to reprogram the device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the late reply. (I didn't use my primary email to setup my github account. Need to do a better job in that.)
In our use case, we don't expect the system to recover after orchagent crashes. The system will go to "critical state". The controller will be notified and starts to drain and reboot the switch. Currently, we don't have special handling in recovering P4Orch after crashing.
I would like to understand more on how SONiC handles the crashing. This will help us improve this in the future. I have a few questions on orchagent restarts in the existing SONiC:
- You mentioned that the "configs are reprogrammed by managers". Are those the config managers that reads from CONFIG DB and writes to APPL DB? But for L3 forwarding, there is no CONFIG DB. Does anyone read the host routing and program them in APPL DB?
- When swss container restarts, will syncd container also restarts and wipes the ASIC? If not, how will syncd handle the "duplicate" L3 forwarding requests after orchanger restarts?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes and for BGP, it re-learns the route and program APP_DB. orchagent restart will also restart bgp
- Yes, syncd container restarts when orchagent restarts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. That's very helpful.
From my understanding, the existing SONiC behavior does not meet our requirement. Especially in syncd restart, which will wipe the ASIC. It will cause packet drop even if we re-program the rules. In our usage, we do not restart syncd and orchagent. If any of them crash, the ASIC will still function (but no new rules can be programmed). The system will be in a "critical state". The controller will be notified and starts to drain and reboot the switch.
I updated the HLD a little bit to clarify that we don't support orchagent restart for P4RT table yet.
@prsunny @reshmaintel |
@prsunny This doc also have reference to other un-merged HLDs: |
8498931
to
8837dc2
Compare
@mint570 can you please sign the EasyCLA which is required to merge this PR? Thanks. |
Done. Just need to re-run the check. |
High Level Design document for PINS P4Orch.