Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DUT switch failed to establish session with dynamic BGP neighbors #2412

Closed
wangxin opened this issue Jan 3, 2019 · 3 comments
Closed

DUT switch failed to establish session with dynamic BGP neighbors #2412

wangxin opened this issue Jan 3, 2019 · 3 comments

Comments

@wangxin
Copy link
Contributor

wangxin commented Jan 3, 2019

Description

The bgp_speaker automation test case failed with error "ERROR! 'dict object' has no attribute u'10.154.239.129'"

This script starts 3 bgp peers on the PTF container and announces routes to DUT switch. However, the DUT switch
could not establish BGP session with the peers simulated by exabgp on PTF.

Image from the 201811 branch does not have this issue.

Steps to reproduce the issue:

  1. The t0 topology is used.
  2. DUT switch is running image from master branch.
  3. Run the bgp_speaker test case from sonic-mgmt docker.

Describe the results you received:

In ansible log:

TASK [test : Verify accepted prefixes of the dynamic neighbors are correct] ****
task path: /var/johnar/sonic-mgmt/ansible/roles/test/tasks/bgp_speaker.yml:161
Thursday 03 January 2019  11:39:12 +0000 (0:00:03.158)       0:01:51.172 ******
fatal: [arc-switch1029-t0]: FAILED! => {"failed": true, "msg": "ERROR! 'dict object' has no attribute u'10.154.239.129'"}

Check bgp neighbors on DUT switch while script was executing the above task:

$ show ip bgp summary
BGP router identifier 10.1.0.32, local AS number 65100
RIB entries 12807, using 1401 KiB of memory
Peers 8, using 36 KiB of memory
Peer groups 2, using 112 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.0.0.57       4 64600    3245      50        0    0    0 00:02:07     6400
10.0.0.59       4 64600    3244    3246        0    0    0 00:02:02     6400
10.0.0.61       4 64600    3245    3251        0    0    0 00:02:06     6400
10.0.0.63       4 64600    3245    3248        0    0    0 00:02:05     6400

Total number of neighbors 4

Describe the results you expected:

After the exabgp tool announced routes to DUT switch, the DUT switch should be able to establish session with the
exabgp peers. The dynamic neighbors should can be observed in output of "show ip bgp summary".

$ show ip bgp summary
BGP router identifier 10.1.0.32, local AS number 65100
RIB entries 12811, using 1401 KiB of memory
Peers 11, using 50 KiB of memory
Peer groups 2, using 112 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.0.0.57       4 64600    3319     122        0    0    0 00:05:45     6400
10.0.0.59       4 64600    3319    3322        0    0    0 00:05:45     6400
10.0.0.61       4 64600    3318    3322        0    0    0 00:05:43     6400
10.0.0.63       4 64600    3316    3322        0    0    0 00:05:42     6400
*10.154.239.129 4 65432       4    6411        0    0    0 00:00:45        1
*10.154.239.130 4 65432       4    6411        0    0    0 00:00:46        1
*192.168.0.2    4 65432       4    6411        0    0    0 00:00:48        1

Total number of neighbors 7
* - dynamic neighbor
3 IPv4 dynamic neighbor(s), limit 100

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

```
SONiC Software Version: SONiC.HEAD.842-1e8d3ec
Distribution: Debian 9.6
Kernel: 4.9.0-8-amd64
Build commit: 1e8d3ec
Build date: Tue Jan  1 13:38:01 UTC 2019
Built by: johnar@jenkins-worker-4
```

**Attach debug file `sudo generate_dump`:**

```
$ sudo generate_dump 
Create switch, #0 INIT_SWITCH=false 
Jan 03 11:59:11 INFO    LOG: Initializing SX log with STDOUT as output file.
init_buffer_resource_limits[
num_ingress_pools:8
num_egress_pools:8
num_total_pools:21
num_port_queue_buff:16
num_port_pg_buff:8
unit_size:96
max_buffers_per_port:93
init_buffer_resource_limits]
The SAI dump is generated to /tmp/sai_sdk_dump_01_03_2019_11_59_AM
Jan 03 11:59:13 ERROR   SAI_SWITCH: mlnx_sai_switch.c[4690]- mlnx_shutdown_switch: Failed to delete default router - Entry Not Found.
Jan 03 11:59:13 ERROR   SAI_ACL: mlnx_sai_acl.c[10972]- mlnx_acl_rpc_call: Failed to send data througn the socket - No such file or directory
Jan 03 11:59:13 ERROR   LOG: ASSERT in cl_thread.c[99]- cl_thread_destroy
Jan 03 11:59:13 ERROR   LOG: ASSERT - Retrieved a list of 6 elements.
Jan 03 11:59:13 ERROR   LOG: ASSERT - Element 0: /usr/lib/libsxcomp.so.1(cl_thread_destroy+0xad) [0x7fdd2bb3884d].
Jan 03 11:59:13 ERROR   LOG: ASSERT - Element 1: /usr/lib/libsai.so.1(mlnx_acl_deinit+0x27d) [0x7fdd36c2f0ad].
Jan 03 11:59:13 ERROR   LOG: ASSERT - Element 2: /usr/lib/libsai.so.1(+0x17cf97) [0x7fdd36ca6f97].
Jan 03 11:59:13 ERROR   LOG: ASSERT - Element 3: saisdkdump() [0x4018a2].
Jan 03 11:59:13 ERROR   LOG: ASSERT - Element 4: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fdd357ecb45].
Jan 03 11:59:13 ERROR   LOG: ASSERT - Element 5: saisdkdump() [0x401b00].
```

bgpd.log.1.gz
bgpd.log.gz
syslog.1.gz
syslog.2.gz

@vincent201881
Copy link

Is the exabgp process up on ptf-docker? Maybe it not up.

@vincent201881
Copy link

vincent201881 commented Jan 17, 2019

I have the same problem few days ago! Find that the exabgp failed up on ptf-docker, finally I change the script file http_api.py like this:
app.run(port=int(sys.argv[1]))
app.run() need the integer, but sys.argv[1] passed string.
You could check that your problem is the same problem?

@wangxin
Copy link
Contributor Author

wangxin commented Jan 22, 2019

@vincent201881 Thanks for sharing your findings!

File http_api.py on my testbed has correct content:
app.run(port=int(sys.argv[1]))

According to Stepan Blyschak, root cause of this issue is #2427 lo address not synced to the asic
It has been fixed by PR sonic-net/sonic-swss#742

Just tested latest build of master branch. Test passed.
Tested build: SONiC.HEAD.860-cacff75

I am closing this issue.

@wangxin wangxin closed this as completed Jan 22, 2019
wen587 added a commit to wen587/sonic-buildimage that referenced this issue Oct 10, 2022
423779410 [muxcable][config] add CLI support for mux mode detach (sonic-net#2425)
a817896b1 YANG validation for ConfigDB Updates: MGMT_INTERFACE, PORTCHANNEL_MEMBER use cases (sonic-net#2420)
81e2aecca [minigraph] new workflow for golden path (sonic-net#2396)
c1206aac3 ConfigDB Updates with YANG Validation: Include potential for YANG validation even when adhoc validation is used (sonic-net#2412)
57c509a9d [show] vnet endpoint [ip/ipv6] command (sonic-net#2342)
4b2b766ac [actions] Support Semgrep by Github Actions (sonic-net#2417)
156257e2a check for vxlan mapping before removing vlan (sonic-net#2388)
cb0edd310 Fix for show vxlan tunnel command display issue sonic-net#11902 (sonic-net#2391)
ac71d745d [VxLAN]Fix Vxlan delete command to throw error when there are references (sonic-net#2404)
7419c6731 Added cisco config platform commands (sonic-net#2242)
8760bbe80 Add UT to check sonic installer does not depend on database (sonic-net#2401)
6bef65260 [doc] add documentation on automatic techsupport based on memory (sonic-net#2411)
4a783745f [doc] update "config feature" section with "--block" option (sonic-net#2409)
dd6210fcc [Vxlanmgrd] [CPA] Update the vxlan_tunnel name len to be under IFNAMIZ to overcome netdev creation failure (sonic-net#2398)
bdc4a8a60 Fix broken pipeline build URL (sonic-net#2363)
b31681b43 Fix display disorder problem of show vrf  (sonic-net#2392)
123504a85 YANG validation for ConfigDB Updates: portchannel add/remove, loopback interface, VLAN
28f6820c6 [link-local]Modify RIF check to include link-local enabled interfaces (sonic-net#2394)
tshalvi pushed a commit to tshalvi/sonic-buildimage that referenced this issue Dec 20, 2022
…nic-net#2412)

Signed-off-by: Mariusz Stachura <[email protected]>

What I did
Adding the dynamic headroom calculation support for Barefoot platforms.

Why I did it
Enabling dynamic mode for barefoot case.

How I verified it
The community tests are adjusted and pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants