Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port link goes down when SWSS is restarted #307

Merged
merged 1 commit into from
Oct 27, 2022

Conversation

mihirpat1
Copy link
Contributor

Signed-off-by: Mihir Patel [email protected]

Description

Port link goes down when SWSS is restarted.

Motivation and Context

Issue
It seems that when SWSS restarts, the admin_status field in STATE_DB changes from 'up' to 'down' followed by going back to 'up' state again.
Following is the sequence of events

  1. User restarts SWSS
  2. Once SWSS is restarted, the STATE_DB sees that 'host_tx_ready' field has been added with a value true. This generates a on_port_update_event and triggers all data structures to be updated. While this happens, 'admin_status' in STATE_DB is down and hence, xcvrd disables Tx due to this.
  3. After sometime, 'admin_status' in STATE_DB is updated with the value 'up'. However, this doesn't generate an update since on_port_update_event is generated for STATE_DB only when there is a change with any of the fields described in the filter (currently, 'host_tx_ready' field is the only field for filtering) and hence, the port remains down forever unless user triggers shut/no shut.

Resolution
The filtering logic has now been modified so that fvp stores only fields present in the filter or fields which are part of a predefined set. This will help in generating on_port_update_event only when any field from the filter changes and will also help in removing unwanted fields from the database to be tagged along while sending the on_port_update_event.
In case the filter is None, the fvp will remain unchanged.

How Has This Been Tested?

Following tests were performed

  1. SWSS was restarted and it was ensured that the link comes up eventually
  2. xcvrd was restarted and it was ensured that the link remains up. This was followed by restarting SWSS to ensure that link comes up once SWSS is up and running
  3. SWSS was killed and then started. It was ensured that the link came up

After restarting SWSS, link up is seen for all expected ports:
root@sonic:/home/admin# show int statu
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC


Ethernet0 1,2,3,4,5,6,7,8 400G 9100 N/A Ethernet1/1 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver N/A
Ethernet8 9,10,11,12,13,14,15,16 400G 9100 N/A Ethernet2/1 trunk down up N/A N/A
Ethernet16 17,18,19,20,21,22,23,24 400G 9100 N/A Ethernet3/1 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver N/A
Ethernet24 25,26,27,28,29,30,31,32 400G 9100 N/A Ethernet4/1 trunk down up N/A N/A
Ethernet32 33,34,35,36,37,38,39,40 400G 9100 N/A Ethernet5/1 trunk down up N/A N/A
Ethernet40 41,42,43,44,45,46,47,48 400G 9100 N/A Ethernet6/1 trunk down up N/A N/A
Ethernet48 49,50,51,52,53,54,55,56 400G 9100 N/A Ethernet7/1 trunk down up N/A N/A
Ethernet56 57,58,59,60,61,62,63,64 400G 9100 N/A Ethernet8/1 trunk down up N/A N/A
Ethernet64 65,66,67,68,69,70,71,72 400G 9100 N/A Ethernet9/1 trunk down up N/A N/A
Ethernet72 73,74,75,76,77,78,79,80 400G 9100 N/A Ethernet10/1 trunk down up N/A N/A
Ethernet80 81,82,83,84,85,86,87,88 400G 9100 N/A Ethernet11/1 trunk down up N/A N/A
Ethernet88 89,90,91,92,93,94,95,96 400G 9100 N/A Ethernet12/1 trunk down up N/A N/A
Ethernet96 97,98,99,100,101,102,103,104 400G 9100 N/A Ethernet13/1 trunk down up N/A N/A
Ethernet104 105,106,107,108,109,110,111,112 400G 9100 N/A Ethernet14/1 trunk down up N/A N/A
Ethernet112 113,114,115,116,117,118,119,120 400G 9100 N/A Ethernet15/1 trunk down up N/A N/A
Ethernet120 121,122,123,124,125,126,127,128 400G 9100 N/A Ethernet16/1 trunk down up N/A N/A
Ethernet128 129,130,131,132,133,134,135,136 400G 9100 N/A Ethernet17/1 trunk down up N/A N/A
Ethernet136 137,138,139,140,141,142,143,144 400G 9100 N/A Ethernet18/1 trunk down up N/A N/A
Ethernet144 145,146,147,148,149,150,151,152 400G 9100 N/A Ethernet19/1 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver N/A
Ethernet152 153,154,155,156,157,158,159,160 400G 9100 N/A Ethernet20/1 trunk down up N/A N/A
Ethernet160 161,162,163,164,165,166,167,168 400G 9100 N/A Ethernet21/1 trunk up up QSFP-DD Double Density 8X Pluggable Transceiver N/A
Ethernet168 169,170,171,172,173,174,175,176 400G 9100 N/A Ethernet22/1 trunk down up N/A N/A
Ethernet176 177,178,179,180,181,182,183,184 400G 9100 N/A Ethernet23/1 trunk down up N/A N/A
Ethernet184 185,186,187,188,189,190,191,192 400G 9100 N/A Ethernet24/1 trunk down up N/A N/A
Ethernet192 193,194,195,196,197,198,199,200 400G 9100 N/A Ethernet25/1 trunk down up N/A N/A
Ethernet200 201,202,203,204,205,206,207,208 400G 9100 N/A Ethernet26/1 trunk down up N/A N/A
Ethernet208 209,210,211,212,213,214,215,216 400G 9100 N/A Ethernet27/1 trunk down up N/A N/A
Ethernet216 217,218,219,220,221,222,223,224 400G 9100 N/A Ethernet28/1 trunk down up N/A N/A
Ethernet224 225,226,227,228,229,230,231,232 400G 9100 N/A Ethernet29/1 trunk down up QSFP28 or later N/A
Ethernet232 233,234,235,236,237,238,239,240 400G 9100 N/A Ethernet30/1 trunk down up N/A N/A
Ethernet240 241,242,243,244,245,246,247,248 400G 9100 N/A Ethernet31/1 trunk down up N/A N/A
Ethernet248 249,250,251,252,253,254,255,256 400G 9100 N/A Ethernet32/1 trunk down up N/A N/A
Ethernet256 258 10G 9100 N/A Ethernet33 trunk down up N/A N/A
Ethernet260 257 10G 9100 N/A Ethernet34 trunk down up N/A N/A
root@sonic:/home/admin#
[dut1] 0:ssh*

Additional Information (Optional)

@mihirpat1 mihirpat1 requested a review from prgeor October 21, 2022 22:30
@mihirpat1 mihirpat1 marked this pull request as ready for review October 21, 2022 22:58
@@ -134,6 +134,12 @@ def subscribe_port_update_event(namespaces, logger):
port_tbl, list(d.values())[0], namespace))
return sel, asic_context

def apply_filter_to_fvp(filter, fvp):
if filter is not None:
for key in fvp.copy().keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why copy()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a copy of the fvp while iterating will prevent running into issues caused by dynamically changing fvp since we will be deleting unwanted entries from fvp as we iterate.

@prgeor prgeor merged commit 82fc7a6 into sonic-net:master Oct 27, 2022
@prgeor
Copy link
Collaborator

prgeor commented Oct 27, 2022

@yxieca could you help cherry-pick to 202205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants