update to master #1

ANISH-GOTTAPU · 2021-02-19T11:17:50Z

Description of PR

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

We do have ability run TC with --topology t0,t1 etc. argument. But if we want to collect which cases support for example t0 topo without execution, option --collect-only always returns all available tests in module/folder. PR resolves this problem. What is the motivation for this PR? Can not collect test cases based on topology. How did you do it? Enhanced check_topology method. Added pytest_collection_modifyitems hook. Signed-off-by: Roman Savchuk <[email protected]>

By default, Ansible will search for group variables from the directory where inventory file locates. For vtestbed Pytest users who calls Pytest with `tests/veos_vtb`, Ansible fails to find the groups variables defined under `ansible/group_vars`, thus fail to use `multi_passwd_ss`. So let's add those connection variables directly to `veos_vtb` Signed-off-by: Longxiang Lyu <[email protected]>

Signed-off-by: Wei Bai [email protected] Add PFC test cases implemented using Tgen API & Update Tgen sample script How did you do it? Add pre-tests to get lossless and lossy priority information of each DUT in the testbed (see tests/test_pretest.py and tests/conftest.py) Add IXIA fixtures, e.g., ixia_api and ixia_t0_testbed (L2/L3 config of a T0 testbed). Add QoS fixtures and other helper functions Add general PFC test helper functions in tests/pfc/files/helper.py. Currently, we reuse an IXIA session (ixia_api, module-level fixture) to run a series of tests. Different tests use the same testbed configuration (ixia_t0_testbed) but have different flow configurations and expected results. How did you verify/test it? I did test using SONiC switches and IXIA chassis. The Tgen API version is 0.0.64. Supported testbed topology if it's a new test case? T0 topology using IXIA chassis as the fanout Documentation https://github.com/Azure/sonic-mgmt/blob/master/docs/pfc/GLOBAL_PAUSE_README.md https://github.com/Azure/sonic-mgmt/blob/master/docs/pfc/PFC_PAUSE_LOSSLESS_README.md https://github.com/Azure/sonic-mgmt/blob/master/docs/pfc/PFC_PAUSE_LOSSY_README.md

In dualtor testbed, mux Y cable is simulated by OVS in test server. An OVS bridge is created for each of the mux. The PTF interface and test server VLAN interfaces are attached to the bridge. The VLAN interfaces are connected to DUT ports through fanout switches. +--------------+ | +----- upper_if PTF (host_if) --+ OVS bridge | | +----- lower_if +--------------+ Open flow rules are configured for the OVS bridge to simulate upstream broadcasting and downstream dropping traffic from standby side. To further simulate mux Y cable active/standby querying and setting, a process need to be started in the test server. The process needs to expose APIs for querying and setting active/standby status. On DUT side, a plugin can be injected to intercept the calls to mux Y cable. Instead of calling the actual mux driver functions, the APIs of mux simulator are called. Then the process checks and updates open flow configurations of the OVS bridge accordingly This change added a program that can expose such HTTP APIs for the injected plugin on DUT side to query and set mux Y cable active/standby status. The program is a python script based on Flask. It is copied to test server and started as a systemd service during 'testbed-cli.sh add-topo' for dualtor type topology. While run 'testbed-cli.sh remove-topo', the service is stopped if no testbed is using it. Signed-off-by: Xin Wang <[email protected]>

Signed-off-by: Neetha John <[email protected]> Fixes the error seen during test case collection introduced by #2599. Initialize the variable as a list https://sonic-jenkins.westus2.cloudapp.azure.com/job/common/job/sonic-mgmt-testing/job/sonic-mgmt-pr/2462/consoleFull How did you verify/test it? Tried test case collection with the fix. No longer seeing the error

…mong ansible modules. (#2623) What is the motivation for this PR? Sometimes, there are utilities that are required for across multiple ansible modules in ansible/library. For example: * creating of a timestamped debug file, and printing message to it. * minigraph_facts has logic to get port_name_to_alias_map that is based on 'hwksu'. If we want to use the same logic in some other ansible module, then we would have to replicate the same code in both the modules. How did you do it? Usage of ansible module_utils is defined at https://docs.ansible.com/ansible/latest/dev_guide/developing_module_utilities.html To have custom module_utils, we need to: * Add module_utils directory under ansible. * Add module_utils in ansible.cfg to point to this module_utils directory. So, we did the following: * Added module_utils directory under ansible * Added module_utils in ansible.cfg to use this directory for ansible modules. Added debug_utils as an example with the following methods/utilities: * create_debug_file - create a timestamped debug file * print_debug_msg - print a message to the debug file. Added debug commands above to conn_graph_facts as an example. How did you verify/test it? Made sure that conn_graph_facts works and the debug file is created.

This change improved the mux-simulator to support a new API for dropping traffic to specified output ports. For example: POST /mux/<vm_set/<port_index>/drop with data: {"out_ports": ["nic", "tor_b"} Then mux simulator will change the flow action to NIC port (PTF interface) and torB to "drop". POST /mux/<vm_set/<port_index>/output with data: {"out_ports": ["nic", "tor_a"} Then mux simulator will change the flow action to NIC port (PTF interface) and torA to "output". This change also enhanced the mux-simulator with logging capability for easier troubleshooting. Signed-off-by: Xin Wang <[email protected]>

…nt MACs (#2679) * Fixed FDB cleanup race issue where the mac flush may have occurred at wrong time when new mac learnt. This end up causing packet forwarding issues and bogus test failures.

This change added utilities to get ansible variables by hostname or group name. What is the motivation for this PR? In test code and library code, there is a frequent need of getting value of specific ansible variables belong to or is visible to specific ansible host or ansible group. This changed added utility functions to simplify this effort. By using these functions, we just need to supply the list of inventory files, server or group name, variable name. How did you do it? The functions will take advantage of the ansible's InventoryManager and VariableManager to get value of variables. Added functions: * get_inventory_manager(inv_files) * get_variable_manager(inv_files) * get_host_vars(inv_files, hostname, variable=None) * get_host_visible_vars(inv_files, hostname, variable=None) * get_group_visible_vars(inv_files, group_name, variable=None) * get_test_server_vars(inv_files, server, variable=None) The inv_files argument can be a string or list. It's value should be inventory file path or list of inventory file paths. In tests, it can be retrieved from the pytest builtin fixture request, for example: request.config.getoption("ansible_inventory") How did you verify/test it? Import the utility file and test run the functions. Signed-off-by: Xin Wang <[email protected]>

… and MinTableMocker.mock_min_table (#2684) Fixes arguments number error in MinTableMocker.get_expect_cooling_level and MinTableMocker.mock_min_table What is the motivation for this PR? air_flow_dir is no longer used and need to be removed. How did you do it? remove air_flow_dir from MinTableMocker.get_expect_cooling_level and MinTableMocker.mock_min_table How did you verify/test it? Run regression test Any platform specific information? Mellanox

…es. (#2676) What is the motivation for this PR? CPU parameters varies between the branches - Package / Physical representation for the CPU. This fix will allow using a regex expression to be suitable for both terms. Add MSN4600C platform parameters. How did you do it? Use a regex expression instead of a fix "Package" or "Physical" term. Add MSN4600C platform parameters. How did you verify/test it? Run the test using images based on all branches. Any platform specific information? Mellanox Signed-off-by: Shlomi Bitton <[email protected]>

Make thermal related test work on both 201911 and master What is the motivation for this PR? Now thermalctld works different between 201911 and master. For example, show platform fan displays 6 columns on 201911 but 8 columns on master. This PR makes it work both on master and 201911. How did you do it? * Add a new property kernel version in SonicHost, for 201911, the version should be 4.9.0, for master, the version should be 4.19.0 or higher. * Make some changes in test cases according to the kernel version How did you verify/test it? Run thermal related regression test cases

…2675) Description of PR Summary: Fixes # (issue) This PR is to introduce new interfaces for packet drop in y_cable simulator. The interfaces interact with mux simulator server introduced in PR #2673 and control the flow on each interface. Type of change Bug fix Testbed and Framework(new/improvement) Test case(new/improvement) Approach What is the motivation for this PR? This PR is to introduce new interfaces for packet drop in y_cable simulator. How did you do it? Send HTTP requests to interact with mux simulator server introduced in PR #2673 and control the flow on each interface. How did you verify/test it? Verified on a dualtor-56 topo. Verify all interfaces work as expected Verify packets are dropped as expected Verify no packet is dropped after recover.

Summary: In reboot/upgrades, the PTF's SSH key stored inside DUT gets deleted. Use a retry mechanism (alt_password) when default admin password does not work. Type of change: Bug fix For tests running on PTF container, alt password is not supported. Use case - To enable alt_password for the tests running in PTF where there is a reboot/upgrade leading to deletion of SSH key and known hosts file inside DUT.

As `nbrshow` only shows entries that has `lldaddr`, we need to enforce same behavior to filter out entries without MAC address, to guarantee it will be shown in the output of `show ndp`. Signed-off-by: Longxiang Lyu <[email protected]>

Signed-off-by: Danny Allen <[email protected]>

An exception will be thrown by platform_config_update in remote.py when sub interfaces (like eth1.20) exist on ptf. This commit addressed the issue by filtering out sub interfaces whose name contains a '.' Signed-off-by: bingwang <[email protected]>

…entry (#2655) Description of PR Summary: Fixes # Removed hardcoded value "Ethernet0" and replaced by automatic selection depends on topo Type of change Bug fix Testbed and Framework(new/improvement) Test case(new/improvement) Approach What is the motivation for this PR? To add possibility run CRM test on t0-64 topo How did you do it? How did you verify/test it? run test_crm_fdb_entry on t0-64

…t case (#2702) Description of PR Summary: Added changes for reboot cause check of cold reboot test case. Approach What is the motivation for this PR? PR #6130 of sonic-buildimage changed the output of CLI command 'show reboot-cause' and test_cold_reboot was failed. The old version of CLI command output $ show reboot-cause User issued 'reboot' command [User: , Time: Wed 23 Dec 2020 12:57:33 PM UTC] The new version of CLI command output $ show reboot-cause Non-Hardware (reboot, time: 2020-12-23 10:56:50 UTC) How did you do it? Added two types of 'show reboot-cause' output to check of cold reboot test case. How did you verify/test it? py.test --testbed=testbed-t1 --inventory=../ansible/lab --testbed_file=../ansible/testbed.csv --host-pattern=testbed-t1 -- module-path=../ansible/library platform_tests/test_reboot.py::test_cold_reboot platform_tests/test_reboot.py::test_cold_reboot PASSED Signed-off-by: Oleksandr Kozodoi <[email protected]>

Description of PR LGTM fixes and new test cases Summary: Fixes # LGTM issues Approach What is the motivation for this PR? Contribute More Test Cases How did you do it? List of new test modules added which can be used to verify Broadcom PRs qos/acl/test_acls_l3_forwarding.py qos/test_cos.py qos/test_wred.py routing/BGP/test_bgp_rr_traffic.py routing/NAT/test_nat.py routing/NAT/test_nat_reboot_long_run.py routing/NAT/test_nat_tcp.py routing/VRF/test_vrf.py routing/VRF/test_vrf_scale.py routing/test_arp_static_route_long_run.py routing/test_l3_performance.py routing/test_ndp.py sanity/test_sanity_l3.py switching/test_vlan.py system/test_sflow.py system/test_snapshot.py system/threshold/test_threshold.py How did you verify/test it? Execute using build with modifications to SONiC by Broadcom in PTF and Ixia testbeds Any platform specific information? None Supported testbed topology if it's a new test case? PTF,IXIA Documentation Co-authored-by: Rama Sasthri, Kristipati <[email protected]>

What is the motivation for this PR? Currently, it is required that all ports on DUT are in use and are connected to a fanout. However, there is a need to be able to run tests where all ports are not in use. Specifically, when dealing with New Sonic devices Sonic device with a front panel port used for in-band management Chassis with lots of ports and multiple asics, were every port on every asic is not required to be covered Chassis as a DUT, where the number of ports can be in hundreds Expensive, high speed ports like 400G (hard to go from 400G down to 1/10G) So, need to add support where not all DUT ports are connected to fanout and are thus not part of the testing. How did you do it? conn_graph_facts: conn_graph_facts.py returns device_vlan_map_list. This used to be a dictionary with key being hostname and value being a list of vlanids for the fanout ports. Have modified this where the value instead of being a list of vlanids, it is a dictionary with key being the port_index and the value being the vlan id. This port_index is what gets put in the topology file. We get the port by looking at the host_port_vlans defined in the conn_graph for that device. This host_port_vlans has key being the Ethernet port (like Ethernet10) and value being a dictionary with key 'vlanlist' being a the list of fanout vlans. We check against all the ports 'vlanlist' to get the port on the DUT that connects to this fanout vlan, and then split on 'Ethernet' to get the port index. For example - lets say on dut with hostname "dut1", we have port Ethernet10 connected to fanout w/ vlan 120, and Ethernet11 connected to fanout w/ vlan 121, then we would have: "host_port_vlans": { "Ethernet10": { "mode": "Access", "vlanids": "120", "vlanlist": [ 120 ] }, "Ethernet11": { "mode": "Access", "vlanids": "121", "vlanlist": [ 121 ] } } "VlanList": [ 120, 121 ] For vlan 120 in VlanList, we would iterate through host_port_vlans to find the port that has vlan 120 - in our case "Ethernet10". The port_index would then be "10". Similarly, for vlan 121, the port_index would be "Ethernet11". So, returned device_vlan_map_list would be: "dut1" : { "10" : 120, "11" : 121 } vlan_port/kvm_port/mellanox_simx_port: Updated to return (dut_fp_ports) a dictionary with key being the port index (same as in the topo file) and vlan being the port - instead of the just the list of ports. bind/unbind vm_topology: vlan_index is now a string in the dictionary of dut_fp_ports updated regexp for checking valid vlan for multi-dut to be of the format '.@' remove_dut_port.yml (bug fix): set cmd to "remove" instead of "create" in vlan_port module call. How did you verify/test it? Tested against pizza box DUT with all DUT ports connected to a fanout, and also against another DUT where we have only 4 of the 52 ports connected to a different fanout.

Description of PR Summary: test_vrf_attr fails due to missing ptftests folder on PTF and incorrect patch to jinja template vrf_attr_src_mac.j2 E RunAnsibleModuleFail: run module shell failed, Ansible Results => ptf: error: invalid value for --test-dir: directory ptftests does not exist E RunAnsibleModuleFail: run module shell failed, Ansible Results => Could not find or access 'vrf_attr_src_mac.j2' Searched in: sonic-mgmt/tests/templates/vrf_attr_src_mac.j2 on the Ansible Controller. Approach What is the motivation for this PR? Fix test_vrf_attr and make it working. How did you do it? Edited patch to jinja template and added copy_ptftests_directory fixture. How did you verify/test it? Run test_vrf_attr TC on topo t0 vrf/test_vrf_attr.py::TestVrfAttrSrcMac::test_vrf_src_mac_cfg PASSED vrf/test_vrf_attr.py::TestVrfAttrSrcMac::test_vrf2_neigh_with_default_router_mac PASSED vrf/test_vrf_attr.py::TestVrfAttrIpAction::test_vrf1_fwd_pkts_without_ip_opt PASSED vrf/test_vrf_attr.py::TestVrfAttrIpAction::test_vrf2_fwd_pkts_with_ip_opt PASSED Signed-off-by: Andrii-Yosafat Lozovyi <[email protected]>

What is the motivation for this PR? Enhance the sanity checks for multi asic platforms How did you do it? The following changes are done: - Add a new ansible module of the show_ip _interfaces to get the show ip interfaces facts - Add multi asic support for the module show_interfaces - Enhance the sanity checks check_interfaces and check_services and check_dbmemory for multi asic - Changed the mechanism of how the down_ports. are determined - Use the module show_interfaces and show_ip _interfaces instead of interface_facts. This change is done because interface_facts used sysfs to get the interface details. This cannot be supported cleanly for multi asic. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <[email protected]>

Summary: Fixes # (issue) test_crm_fdb_entry was consistently failing on TD3. There are 2 reasons for this failure. FDB may get expired at any time Some time is needed for FDB to be cleared because fdbclear is an asynchronous interface. This PR fix these two issues. Type of change Bug fix Testbed and Framework(new/improvement) Test case(new/improvement) Approach What is the motivation for this PR? This PR is to fix test_crm_fdb_entry. How did you do it? Disable FDB aging Add a loop check after fdbclear

Approach What is the motivation for this PR? Some dut triggers kernel panic while installing new image due to harddrive is too slow and taking too long to sync contents to disk. How did you do it? Some devices has slow hard drive, installing image could trigger kernel crash due to task seemingly hang. Increase the hung task timeout value to 600 seconds to work around this issue. Signed-off-by: Ying Xie [email protected] How did you verify/test it? Tested the configuration on dut triggers kernel panic consistently.

Description of PR Summary: Fixes # (issue) Test case test_lag was failing occasionally on some platforms because route_check reported mismatch, which is caused by shutdown of PortChannel interfaces. Type of change Bug fix Testbed and Framework(new/improvement) Test case(new/improvement) Approach What is the motivation for this PR? This PR is to address test errors in test_lag caused by route_check. How did you do it? Disable route_check for test_lag.

* [testbed] Testbed v2 hld Signed-off-by: Longxiang Lyu <[email protected]> * Updates for comments Add flow charts for `init_db` and `provision_db`. Signed-off-by: Longxiang Lyu <[email protected]>

Summary: Address regression caused by PR #2715 . Approach What is the motivation for this PR? PR #2715 introduced an issue where the workaround function doesn't have module parameter to call exec_command with. How did you do it? Pass in the module parameter so that workaround function can execute command properly. How did you verify/test it? Run upgrade_sonic task against an DUT.

Existing minigraph generation does not include loopback IPs for ToRs in the PNG section. This information is needed for dual ToR topologies Signed-off-by: Lawrence Lee <[email protected]>

sonic-mgmt image name only allows all lower case. In case the user name is mixed case, we need to change user name to all lower case. Signed-off-by: Guohan Lu <[email protected]> Co-authored-by: Lawrence Lee <[email protected]>

Approach What is the motivation for this PR? With manual image changes, the y_cable_simulator_client on dual ToR testbeds is overwritten. Currently, a new minigraph needs to be deployed to inject the client again, which has a lot of overhead and is not desirable. How did you do it? Create a test case in test_pretest.py which creates and injects the simulator client to ToRs in a dual ToR topology. How did you verify/test it? ./run_tests.sh -n <testbed_name> -i <inventory files> -u -c test_pretest.py::test_inject_y_cable_simulator_client Verify that the file gets placed into /usr/lib/python3/dist-packages inside the pmon container on both ToRs. Deploy minigraph, ensure that it completes successfully. Any platform specific information?

Approach What is the motivation for this PR? To allow TestWatchdogApi to pass fully on Arista platforms How did you do it? Updated watchdog.yml to include suitable timeout configuration for Arista watchdog How did you verify/test it? Ran the test with the updated file to confirm that the test passes Any platform specific information? Applies to arista platforms only Signed-off-by: Andy Wong [email protected]

…2954) Description of PR Modified route/test_route_perf.py to use 'enum_rand_one_per_hwsku_frontend_hostname' instead of 'rand_one_dut_hostname' Approach What is the motivation for this PR? The testcase was already modified to use 'enum_rand_one_per_hwsku_frontend_hostname' but was missing in some functions, This PR takes care of that. How did you do it Replaced 'rand_one_dut_hostname' with 'enum_rand_one_per_hwsku_frontend_hostname' Co-authored-by: falodiya <[email protected]>

Description of PR Summary: Add a fixture to disable thermal policy What is the motivation for this PR? In some test cases, thermal policy would override the mock value and cause test case failure sporadicly. So need a way to disable thermal policy during that test. The idea is to make thermalctld load an invalid thermal policy file. This PR provides a handy fixture to achieve this function. How did you do it? add a fixture to disable thermal policy use this fixture in test_show_platform_fanstatus_mocked and test_device_checker How did you verify/test it? Manually run test_platform_info and test_snmp_phy_entity

* [dualtor]: Add utilities for dual ToR mocking * Apply config DB tables to mock dual ToR setup * Apply kernel configurations (neighbor entries and route) * Apply orchagent config to mock dual ToR setup Signed-off-by: Lawrence Lee <[email protected]>

…ure value (#2989) Include 0 in range of valid low threshold, temperature values

Description of PR Summary: This testplan covers convergence measurement by creating real world failure events using a single DUT scenarios. Type of change Test plan Document

Data plane I/O utilities and fixtures for dual-TOR tests The utilities are provided as fixtures which yield the data-plane-io functionality to be used within tests. The I/O test runs mostly on sonic-mgmt-container utilizing the PTF-adapter utility, except the sniffer part, which runs on the PTF container due to permission issues and socket ports that scappy sniffer demands. Two types of I/O support is provided in two directions - T1 to server, and server to T1. Adhoc testing was done to verify the functionality of the data plane traffic verification.

Summary: skip nat test when image doesn't have nat feature enabled Approach What is the motivation for this PR? nat tests are failing when executed against image doesn't have feature enabled. How did you do it? check feature table and skip the test when feature not enabled. How did you verify/test it? run nat test against an image that doesn't have nat feature enabled.

Approach What is the motivation for this PR? Add console/PDU link to the device connection graph. How did you do it? Improve creategraph.py to allow specifying console connection and pdu connections. Improve creategraph.py to allow specifying inventory name and generate input file names. Update conn_graph_facts.py to return device PDU/console information. Allow console/PDU information missing so that community members can take time to catch up. Supporting DUT connects to multiple PDUs. How did you verify/test it? Tested with graph creation without specifying the console and pdu information. Tested with instrumentation code (on graph generated before this change, and graph generated with this change but no console/pdu information, and graph with incremental console/pdu information):

Fake storm option was added to reduce the flakiness of test runs seen on some platforms due to actual pfc storm not large enough to trigger pfcwd. This is causing some failures on Mellanox platforms after warm reboot. Signed-off-by: Neetha John <[email protected]> How did you do it? Created a module scoped 'fake_storm' fixture and set its status to False for Mellanox platforms How did you verify/test it? Ran both the tests (test_pfcwd_function.py and test_pfcwd_warm_reboot.py) on Mellanox and non Mellanox platforms and they passed Verified in the logs that pfc storm was always generated by the fanout for Mellanox platforms and for non Mellanox platforms only the 1st port had the storm generated by the fanout. Rest of the ports were using the fake storm

Description of PR Summary: This PR implements a new test case test_snmp_v2mib according to the following test plan. Test should verify that SNMPv2-MIB objects are functioning properly. Testplan: Retrieve facts for a device using SNMP Get expected values for a device from system Compare that facts received by SNMP are equal to values received from system Approach What is the motivation for this PR? This PR is to add a new test case for snmp. How did you verify/test it Run test on t0 and t1 topo. snmp/test_snmp_v2mib.py::test_snmp_v2mib PASSED Any platform specific information? SONiC.master.117-dirty-20210207.073945 Distribution: Debian 10.8 Kernel: 4.19.0-9-2-amd64 Build commit: 3cc55154 Build date: Sun Feb 7 07:55:17 UTC 2021 Platform: x86_64-arista_7170_64c HwSKU: Arista-7170-64C Supported testbed topology if it's a new test case? Supports any topo. Signed-off-by: Andrii-Yosafat Lozovyi <[email protected]>

Approach What is the motivation for this PR? fix sonic-net/sonic-buildimage#6717 add script to download artifacts from azure pipeline How did you verify/test it? usage: getbuild.py [-h] [--buildid buildid] [--branch branch] [--platform platform] [--content content] Download artifacts from sonic azure devops. optional arguments: -h, --help show this help message and exit --buildid buildid build id --branch branch branch name --platform platform platform to download --content content download content type [all|image(default)]

Description of PR After transition from port_config.ini to platform.json, hwsku.json, we can`t fully deprecate and remove port_config.ini cause of port_mgmt/ansible use port_config.ini for generate minigraph. So currently we have configuration(platform.json, hwsku.json) using by sonic for configure ports and configuration(port_config.ini) using by sonic_mgmt for generate minigraph. So changes in platform.json and hwsku.json will not affect minigraph because of port_config.ini so we have configuration mismatch. PR add functionality to sonic_mgmt for parse platform.json and hwsku.json and use this configuration for generation minigraph. Summary: Approach What is the motivation for this PR? fully deprecate and remove port_config.ini avoid mismatch configurations between port_config.ini and platform.json, hwsku.json How did you do it? Add functionality for parsing platform.json, hwsku.json and use this configuration for generate minigraph. How did you verify/test it? Compare generated minigraph/config_db configurations with platform.json, hwsku.json.

Description of PR Summary: Fix the test bug which hangs the test execution, if warmboot has an issue (either in shutdown or boot-up path). Approach Replace the duthost.get_up_time() with duthost.get_now_time() This change is needed as the duthost.get_up_time() call always returns the same value (the time since the DUT was last UP). As a result, the time_passed value always remains a constant int. With get_now_time, time_passed gets updated every iteration, and the loop exits when timout occurs. How did you verify/test it? Tested on a DUT where the issue was seen:

…t session (#2958) Approach What is the motivation for this PR? sanity check: The BGP and interface sanity checks for multi-dut were changed to iterate through all the nodes in a multi-dut setup. However, for a T2 chassis, the multi-dut testbed contains a supervisor card as well, which doesn't have any BGP/PORT configuration. Thus, these checks fail for supervisor card. cache cleanup: If we change the inventory file and re-run pytest, the old cached data for the inventory is used instead of picking up the changes. How did you do it? sanity_check: Iterate through frontend_nodes of duthosts instead of all the nodes. cache cleanup: Added fixture cleanup_cache_for_session that is called at the beginning of a session to remove the cached facts for all the DUTs in the testbed. This is not an automatic fixture, and is needed in the following scenarios: Running tests where some 'facts' about the DUT that get cached are changed. Running tests/regression without running test_pretest which has a test to clean up cache (PR#2978) Test case development phase to work out testbed information changes.

Signed-off-by: Wei Bai [email protected] What is the motivation for this PR? Test if enabling PFC watchdog will impact runtime traffic How did you do it? Start data traffic first then enable PFC watchdog at the SONiC DUT How did you verify/test it? I did test using Mellanox SN2700 and IXIA chassis

What is the motivation for this PR? For server recovery script, we need the new fields inv_name to decide which lab inventory to use in recover tasks(deploy-mg), and we need the new fields auto_recover to decide whether we should recover this testbed.

What is the motivation for this PR? Implement automated tests to cover "System Initialization" section of the Distributed VoQ Architecture test plan (https://github.com/tcusto/sonic-mgmt/blob/master/docs/testplan/Distributed-VoQ-Arch-test-plan.md). How did you do it? There are new tests in test_voq_init.py for verifying the VoQ switch, system and local ports, router interfaces on local and system ports, and neighbors. The switch and port tests run on each linecard in a VoQ chassis system and verify the T2 configuration has made it correctly to the ASIC DB. The router interface test also verifies the Chassis App DB on the supervisor card. The neighbor tests verify that local neighbor creation is propagated from the local linecard to the Chassis App DB on the supervisor and into ASIC and APP DBs on remote linecards. Inband ports are also tested in the port and router interface test, and there is a separate test case for inband neighbors establishment. There is a voq_helpers.py containing verification and utility functions that will be shared with future VoQ test scripts. It contains routines for checking local and remote neighbors databases, checking kernel routes for remote neighbors, checking ARP tables, and other shared validations. Lastly, in common/helpers/redis.py are classes for accessing the ASIC, APP, and Chassis APP DBs via the CLI. There are methods for getting keys from various tables and calling get or hget on keys. All of the redis db interactions that the tests perform are centralized here. How did you verify/test it? Ran the new tests against a chassis with T2 topology configured. Co-authored-by: Tom Custodio <[email protected]>

What is the motivation for this PR? Creating test plan and cases for validating the design of the distributed VoQ architecture HLD How did you do it? Based on the code changes in the associated PRs Co-authored-by: Tom Custodio <[email protected]>

…#2996) Some minor fixes of RDMA IXIA test cases Signed-off-by: Wei Bai [email protected] How did you do it? Check if flows complete in max_attempts rounds/seconds. Fix a variable name How did you verify/test it? I did test using Mellanox SN2700 and IXIA chassis

Description of PR Summary: There's not enough data from sonic-mgmt for device facts fetched by platform API. In Arista testbed we use veos file, the example lab file needs more fields in order to pass the platform API tests. Approach What is the motivation for this PR? Some platform API tests fail because it cannot fetch the values for device data from duthost_vars. https://github.com/Azure/sonic-mgmt/blob/master/tests/platform_tests/api/test_chassis.py#L61 How did you do it? Added additional fields for Arista 7260 in lab file.

…2863) This PR contains the following changes - Refactor pfc pause test to use existing helpers for storm generation - Enumerate testcase based on priorities for easier debugging - Collect all errors and assert at the end to allow test to run on all background priorities - Added a debug option in ptftests to allow captured packets to be dumped into a file for post test analysis - Cleanup trailing white spaces Signed-off-by: Neetha John <[email protected]> How did you verify/test it? Ran the test with the changes on Th and it passed

This PR implements a utility for checking balance in IPinIP tunnel. A script ip_in_ip_tunnel_test.py will be running on ptf and generate traffic to standby ToR. The packets should be forwarded to active ToR through IPinIP tunnel, and the script on ptf will capture and verify if the traffic distribution is balanced.

If topology change is not performed correctly, some open vswitch bridges with "mbr-" prefix may not be completly removed. The mux simulator server is not able to handle those garbage bridges properly and could return error if trying to get mux status of all ports. This change enhanced the mux simualtor to exclude bridges that do not have 3 ports attached to avoid such issue. Some other enhancements: * Add log rotation support. * Add troubleshooting guide in documentation. Signed-off-by: Xin Wang <[email protected]>

Summary: Fixes # (issue) In topo dualtor_56, a 100G interfaces are splited into two 50G interfaces according to port_config.ini. As a result, the physical port id got from SFP is different with host interface index, which is used as the name of mux bridge. For example, the physical port id for Ethernet4 is 2 according port_config.ini, but the interface is connect to mux bridge mbr-vms17-8-2. Therefore, we need to calculate host interface index from physical port id. This PR addressed the issue by calculating host interface index according to port config file. Most of the logic is same with read_porttab_mappings in sfputilhelper.py

Create a new supervisor service to run on the PTF which sends GARP messages for each configured interface. This service takes two option CLI arguments, --conf and --interval. --conf specfiies the location of the configuration file (default to /tmp/garp_conf.json). --interval specifies the interval to wait between re-sending GARP messages (default to None, which causes the messages to only be sent once). Create a fixture to automatically configure/run this fixture in ptfhost_utils.py

In PR #2741, the fib plugin for announcing routes have been deprecated. Announcing routes is done together with add-topo. The test_dir_bcast.py script dependent on the fib plugin was missed in PR #2741. Since the routes have already been announced during add-topo, there is no need to do that for test_dir_bcast.py again. To fix the issue, we can simply removes the dependency of the fib plugin for test_dir_bcast.py. Signed-off-by: Xin Wang <[email protected]>

roman_savchuk and others added 30 commits December 15, 2020 20:36

Fixed FDB cleanup race issue where the mac flush may flush newly lear…

7b36652

…nt MACs (#2679) * Fixed FDB cleanup race issue where the mac flush may have occurred at wrong time when new mac learnt. This end up causing packet forwarding issues and bogus test failures.

[copp] Stabilize non-policer copp tests on slower server skus (#2692)

86338be

Signed-off-by: Danny Allen <[email protected]>

[testbed] Testbed v2 hld (#2680)

311ed3c

* [testbed] Testbed v2 hld Signed-off-by: Longxiang Lyu <[email protected]> * Updates for comments Add flow charts for `init_db` and `provision_db`. Signed-off-by: Longxiang Lyu <[email protected]>

[minigraph]: Add ToR loopback addresses for dual ToR topos (#2718)

7dfe2b4

Existing minigraph generation does not include loopback IPs for ToRs in the PNG section. This information is needed for dual ToR topologies Signed-off-by: Lawrence Lee <[email protected]>

[setup-container]: force user name to lower case (#2722)

426f748

sonic-mgmt image name only allows all lower case. In case the user name is mixed case, we need to change user name to all lower case. Signed-off-by: Guohan Lu <[email protected]> Co-authored-by: Lawrence Lee <[email protected]>

theasianpianist and others added 29 commits February 16, 2021 15:33

[Platform API][Thermal]: Include 0 as a valid low threshold, temperat…

542f53d

…ure value (#2989) Include 0 in range of valid low threshold, temperature values

BGP convergence test plan for benchmark performance (#2926)

5ed5cab

Description of PR Summary: This testplan covers convergence measurement by creating real world failure events using a single DUT scenarios. Type of change Test plan Document

[kvmtest]: Add dual ToR ARP tests to KVM test suite (#2895)

eae4d47

ANISH-GOTTAPU merged commit f98634a into ANISH-GOTTAPU:master Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update to master #1

update to master #1

ANISH-GOTTAPU commented Feb 19, 2021

update to master #1

update to master #1

Conversation

ANISH-GOTTAPU commented Feb 19, 2021

Description of PR

Type of change

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation