Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCLAG enhacements ICCPd initial code commit #4819

Merged
merged 12 commits into from
Aug 4, 2021

Conversation

Praveen-Brcm
Copy link
Contributor

@Praveen-Brcm Praveen-Brcm commented Jun 19, 2020

@rathnasabapathyv
Copy link
Collaborator

Can you please resolve the conflicts

{
if (vlan->vlan_removed == 1)
{
ICCPD_LOG_DEBUG(__FUNCTION__, "Remove %s from VLAN %d", lif->name, vlan->vid);

LIST_REMOVE(vlan, port_next);
#if 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these code changes needed?
All the code in this function has been marked under "#if 0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prvattem: Thanks for the comment. The unused function iccp_parse_if_vlan_info_from_netlink is being removed.

}

return;
}

//this function is no more required - vlan membership updates comes through
//state db updates from mclagsyncd

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes in this function needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prvattem: Thanks for the comment. The unused function iccp_parse_if_vlan_info_from_netlink is being removed.

if (sys->warmboot_start != WARM_REBOOT)
mlacp_clean_fdb();
// do not required to flush FDB
//if (sys->warmboot_start != WARM_REBOOT)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unwanted functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prvattem: Yes, This flush is not required when ICCPd session is coming up. The unwanted flush is being commented.

@@ -104,13 +114,117 @@ static int getHwAddr(char *buff, char *mac)
return 0;
}

#if 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this functionality going to get enabled later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. No unused code is deleted

@@ -375,47 +490,67 @@ static int peer_po_is_alive(struct CSM *csm, int po_ifindex)
return pif_active;
}

#if 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this functionality going to get enabled later?
References to this function are also commented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Yes with latest enhancements this code is not used. But the plan is to keep the change in case if required to enable back.

{
/*TBD: vxlan tunnel port isolation will be supportted later*/
return;
#if 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unwanted code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. The code kept to enable in the future.

@prvattem
Copy link

Please resolve the conflicts, if any.

@Praveen-Brcm
Copy link
Contributor Author

Please resolve the conflicts, if any.

Resolved the merge conflicts.

@Praveen-Brcm
Copy link
Contributor Author

Please resolve the conflicts, if any.

Merged conflicts are resolved

@Praveen-Brcm
Copy link
Contributor Author

@rathnasabapathyv , @prvattem
Merge conflicts are resolved and the review comments are addressed can you please provide approvals if not more comments.
Thanks, - Praveen.

@@ -1361,7 +1788,44 @@ int iccp_system_init_netlink_socket()
ICCPD_LOG_ERR(__FUNCTION__, "Failed to set buffer size of netlink event sock.");
goto err_return;
}
#if 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove dead code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.!

@@ -767,11 +992,11 @@ void mlacp_enqueue_msg(struct CSM* csm, struct Msg* msg)
if (msg == NULL )
return;

#if 0
#if 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove dead code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.!

*
****************************************/
static void mlacp_fsm_send_if_up_ack(
struct CSM *csm,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any possibility that csm may be NULL passed into this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gechiang , Thanks. ACK is sent as part of processing remote IF up event, and CSM can not be NULL at this point.

@@ -188,7 +252,7 @@ int scheduler_csm_read_callback(struct CSM* csm)

return 1;

recv_err:
recv_err:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

goto reject_client;
}
}

/* Accept*/
goto accept_client;

reject_client:
reject_client:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

if (new_fd >= 0)
close(new_fd);
return MCLAG_ERROR;

accept_client:
accept_client:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

@@ -445,7 +522,7 @@ int mlacp_sync_with_kernel_callback()
}
}

out:
out:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

FD_SET(connFd, &(sys->readfd));
sys->readfd_count++;
ICCPD_LOG_INFO(__FUNCTION__, "Connect to server %s sucess .", csm->peer_ip);
goto conn_ok;
}

conn_fail:
conn_fail:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

if (connFd >= 0)
{
csm->sock_fd = -1;
close(connFd);
}
conn_ok:
conn_ok:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

@@ -613,11 +701,11 @@ int scheduler_prepare_session(struct CSM* csm)
session_conn_thread_unlock(&csm->conn_mutex);
}

time_update:
time_update:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

time(&csm->connTimePrev);
return 0;

no_time_update:
no_time_update:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

mlacp_peer_mlag_intf_delete_handler(csm, pif->name);

/* Delete remote interface info from STATE_DB */
if (csm)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checkis not needed. line 92 already ensure cms is not NILL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.!

arp_entry->ifname, show_ip_str(arp_entry->ipv4_addr), mac_str);
return MCLAG_ERROR;
if (err != ICCP_NLE_SEQ_MISMATCH) {
ICCPD_LOG_NOTICE(__FUNCTION__, "ARP add failure for %s %s %s, status %d",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error should be logged as ERROR and not NOTICE right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.!

arp_entry->ifname, show_ip_str(arp_entry->ipv4_addr), mac_str);
return MCLAG_ERROR;
if (err != ICCP_NLE_SEQ_MISMATCH) {
ICCPD_LOG_NOTICE(__FUNCTION__, "ARP delete failure for %s %s %s, status %d",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error condition should be logged with ERROR and not NOTICE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.!

@@ -932,12 +1273,11 @@ int main(int argc, char **argv)

ret = EXIT_SUCCESS;

mclagdctl_disconnect:
mclagdctl_disconnect:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix alignment please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

@pettershao-ragilenetworks
Copy link
Contributor

pettershao-ragilenetworks commented Jul 17, 2021

@Praveen-Brcm can you help check why mclagsyncd socket sometimes receive close but actually iccpd just send some message to socket: "Connection lost" , seems mclagsyncd send fine, but receive exist some problem. is my local build miss some thing?[another device seems fine]
message.txt

ul 17 07:32:10.853588 sonic ERR swss#orchagent: :- addVlan: 4103: Failed to create VLAN Vlan1 vid:1
Jul 17 07:32:10.988675 sonic NOTICE iccpd#mclagsyncd: :- main: 81: MCLAGSYNCD processing mclag_cfg_tbl notifications
Jul 17 07:32:10.988803 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 529: Key(mclag domain_id):1;  op:SET
Jul 17 07:32:10.988888 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 545:  MCLAGSYNCD CFG Table Updates : Field source_ip, Value: 172.28.145.38  EntryExits:0
Jul 17 07:32:10.988888 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 545:  MCLAGSYNCD CFG Table Updates : Field peer_link, Value: PortChannel2  EntryExits:0
Jul 17 07:32:10.988915 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 545:  MCLAGSYNCD CFG Table Updates : Field peer_ip, Value: 172.28.145.36  EntryExits:0
Jul 17 07:32:10.988940 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 671: mclagsycnd: domain cfg processing; mandatory args present; Domain [1] send to iccpd
Jul 17 07:32:10.988965 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 678:  MCLAGSYNCD CFG Table Updates : domain_id:1 op_type:0 attrBmap:0x7 attrDelBmap:0x0 cfg_info.local_ip 172.28.145.38, peer_ip: 172.28.145.36 peer_link:PortChannel2 system_mac:58:69:6c:fb:21:38 session_timeout:0 keepalive_time:0
Jul 17 07:32:10.989030 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 749:  MCLAGSYNCD CFG Table Updates: domain_id:1 infor_len:84 infor_start:0x55f12a5cc750
Jul 17 07:32:10.989030 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 761: mclagsycnd send msg to iccpd, msg_len =84, msg_type =2    count : 0 ver = 1
Jul 17 07:32:10.989069 sonic NOTICE iccpd#mclagsyncd: :- processMclagDomainCfg: 772:  Add dependent selectables to select
Jul 17 07:32:10.989691 sonic NOTICE iccpd#mclagsyncd: :- addDomainCfgDependentSelectables: 780:  MCLAGSYNCD create state fdb table
Jul 17 07:32:10.990154 sonic NOTICE iccpd#iccpd: message repeated 81 times: [ [ICCP_FSM.NOTICE] csm null with peer ip [172.28.145.36]]
Jul 17 07:32:10.990154 sonic NOTICE iccpd#iccpd: [iccp_mclagsyncd_mclag_domain_cfg_handler.NOTICE] recv cfg msg ; domain_id:1 op_type:1 attr_bmap:0x7 local_ip:172.28.145.38 peer_ip:172.28.145.36 peer_ifname:PortChannel2 system_mac:58:69:6c:fb:21:38 session_timeout:0 keepalive_time:0
Jul 17 07:32:10.990154 sonic NOTICE iccpd#iccpd: [ICCP_FDB.NOTICE] mlacp_mac_msg_queue_reinit clear mac_msg_list pointers in existing MAC entries
Jul 17 07:32:10.990154 sonic NOTICE iccpd#iccpd: [ICCP_FSM.NOTICE] scheduler session disconnect handler
Jul 17 07:32:10.990231 sonic NOTICE iccpd#iccpd: [ICCP_FDB.NOTICE] mlacp_mac_msg_queue_reinit clear mac_msg_list pointers in existing MAC entries
Jul 17 07:32:10.992278 sonic NOTICE iccpd#mclagsyncd: :- addDomainCfgDependentSelectables: 783:  MCLAGSYNCD create state vlan member table
Jul 17 07:32:10.992744 sonic NOTICE iccpd#mclagsyncd: :- addDomainCfgDependentSelectables: 786:  MCLAGSYNCD create cfg mclag intf table
Jul 17 07:32:10.992744 sonic NOTICE iccpd#mclagsyncd: :- addDomainCfgDependentSelectables: 794:  MCLAGSYNCD Add state_fdb_tbl to selectable
Jul 17 07:32:10.992744 sonic NOTICE iccpd#mclagsyncd: :- addDomainCfgDependentSelectables: 801:  MCLAGSYNCD Add p_state_vlan_mbr_subscriber_table  to selectable
Jul 17 07:32:10.992744 sonic NOTICE iccpd#mclagsyncd: :- addDomainCfgDependentSelectables: 808: MCLagSYNCD Adding mclag_intf_cfg_tbl to selectable
Jul 17 07:32:10.993196 sonic INFO iccpd#supervisord: iccpd Connection lost, reconnecting...
Jul 17 07:32:10.996576 sonic NOTICE iccpd#mclagsyncd: :- mclagsyncdFetchSystemMacFromConfigdb: 78: mclagysncd: system_mac:58:69:6c:fb:21:38
Jul 17 07:32:10.996709 sonic INFO iccpd#supervisord: iccpd Waiting for connection...
Jul 17 07:32:11.637348 sonic NOTICE iccpd#iccpd: [ICCP_FSM.NOTICE] csm config error with peer ip [172.28.145.36]

root@sonic:/home/admin# docker exec -it iccpd bash
root@sonic:/# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 07:30 pts/0    00:00:00 /usr/bin/python /usr/bin/supervisord
root        17     1  0 07:30 pts/0    00:00:00 /usr/sbin/rsyslogd -n -iNONE
root        29     1  0 07:30 pts/0    00:00:00 bash /usr/bin/iccpd.sh
root        31    29  0 07:30 pts/0    00:00:00 mclagsyncd
root        32    29  0 07:30 pts/0    00:00:00 iccpd
root        52     0  0 07:49 pts/1    00:00:00 bash
root        57    52  0 07:49 pts/1    00:00:00 ps -ef
root@sonic:/# kill -9 31
root@sonic:/# kill -9 32
root@sonic:/# mclagsyncd &
[1] 58
root@sonic:/# Waiting for connection...

root@sonic:/# iccpd &
[2] 60
root@sonic:/# Connected!
Connection lost, reconnecting...
Waiting for connection...

root@sonic:/#

@pettershao-ragilenetworks
Copy link
Contributor

pettershao-ragilenetworks commented Jul 21, 2021

server receive q's size is not empty, seems read() issue:

root@sonic:/home/admin# netstat -ap | grep 2626
tcp        0      0 127.0.0.6:2626          0.0.0.0:*               LISTEN      4692/mclagsyncd
tcp        0      0 localhost:52388         127.0.0.6:2626          ESTABLISHED 4693/iccpd
tcp      863      0 127.0.0.6:2626          localhost:52388         ESTABLISHED 4692/mclagsyncd

@Praveen-Brcm
Copy link
Contributor Author

server receive q's size is not empty, seems read() issue:

root@sonic:/home/admin# netstat -ap | grep 2626
tcp        0      0 127.0.0.6:2626          0.0.0.0:*               LISTEN      4692/mclagsyncd
tcp        0      0 localhost:52388         127.0.0.6:2626          ESTABLISHED 4693/iccpd
tcp      863      0 127.0.0.6:2626          localhost:52388         ESTABLISHED 4692/mclagsyncd

@pettershao-ragilenetworks , The fix for this has been addressed part of PR# sonic-net/sonic-swss#1832 .? Thanks, - Prvaeen

@Praveen-Brcm
Copy link
Contributor Author

@gechiang : Can this be merged now as the PR 1338 is merged, and PR 1331 is blocked on VS tests to re-run. Please confirm and merge.? Thanks - Praveen.

@pettershao-ragilenetworks
Copy link
Contributor

server receive q's size is not empty, seems read() issue:

root@sonic:/home/admin# netstat -ap | grep 2626
tcp        0      0 127.0.0.6:2626          0.0.0.0:*               LISTEN      4692/mclagsyncd
tcp        0      0 localhost:52388         127.0.0.6:2626          ESTABLISHED 4693/iccpd
tcp      863      0 127.0.0.6:2626          localhost:52388         ESTABLISHED 4692/mclagsyncd

@pettershao-ragilenetworks , The fix for this has been addressed part of PR# Azure/sonic-swss#1832 .? Thanks, - Prvaeen

yes.

@gechiang
Copy link
Collaborator

@gechiang : Can this be merged now as the PR 1338 is merged, and PR 1331 is blocked on VS tests to re-run. Please confirm and merge.? Thanks - Praveen.

@Praveen-Brcm, @rlhui I think we discussed this in the meeting. This PR also needs (sonic-net/sonic-swss#1331) or else the existing MCLAG will be broken as soon as this PR is merged... So it was decided that we will also wait for the SWSS(1331) to complete and at that time merge both in parallel to reduce the feature broken time... Is this not what we agreed?

@Praveen-Brcm
Copy link
Contributor Author

@gechiang : Can this be merged now as the PR 1338 is merged, and PR 1331 is blocked on VS tests to re-run. Please confirm and merge.? Thanks - Praveen.

@Praveen-Brcm, @rlhui I think we discussed this in the meeting. This PR also needs (Azure/sonic-swss#1331) or else the existing MCLAG will be broken as soon as this PR is merged... So it was decided that we will also wait for the SWSS(1331) to complete and at that time merge both in parallel to reduce the feature broken time... Is this not what we agreed?

@gechiang : Yes that was the agreement. Earlier we had the dependency with PR#1338 and we had to let that merge for the VS tests to pass for PR#1331. While we already merge 4 of the related PR's my request was to consider the current PR to merge as well, while we hope to have the PR #1331 merge soon once the VS tests pass.

@gechiang
Copy link
Collaborator

@gechiang : Can this be merged now as the PR 1338 is merged, and PR 1331 is blocked on VS tests to re-run. Please confirm and merge.? Thanks - Praveen.

@Praveen-Brcm, @rlhui I think we discussed this in the meeting. This PR also needs (Azure/sonic-swss#1331) or else the existing MCLAG will be broken as soon as this PR is merged... So it was decided that we will also wait for the SWSS(1331) to complete and at that time merge both in parallel to reduce the feature broken time... Is this not what we agreed?

@gechiang : Yes that was the agreement. Earlier we had the dependency with PR#1338 and we had to let that merge for the VS tests to pass for PR#1331. While we already merge 4 of the related PR's my request was to consider the current PR to merge as well, while we hope to have the PR #1331 merge soon once the VS tests pass.

IF the MC LAG feature in current master branch is already broken then we might as well go ahead get this merged... but if it is not, then I prefer to not break the feature for an unknown (potentially long) period of time...

@gechiang gechiang merged commit 82b6bcf into sonic-net:master Aug 4, 2021
@haslersn
Copy link

haslersn commented Aug 5, 2021

Is there a way to download an image (sonic-aboot-broadcom.swi) built from master? Otherwise, where do I get an image for testing MCLAG?

carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
* MCLAG enhacements ICCPd initial code commit
* Resolving the merge conflicts with orighin
* L3 MCLAG Enhancements and Unique IP Changes.
* Addressed review comments

Co-authored-by: Tapash Das <[email protected]>
@akokhan
Copy link
Contributor

akokhan commented Aug 10, 2021

Is there a way to download an image (sonic-aboot-broadcom.swi) built from master? Otherwise, where do I get an image for testing MCLAG?

https://sonic-build.azurewebsites.net/ui/sonic/Pipelines

ICCPD_LOG_DEBUG("ICCP_FDB", "MAC update from mclagsyncd: Update MAC remote to local %s, vlan %d"
" ifname %s, del entry from MCLAG_FDB_TABLE",
mac_addr_to_str(mac_msg->mac_addr), mac_msg->vid, mac_msg->ifname);
del_mac_from_chip(mac_msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we distinguish this ADD is local learn or by "add back" action?

judyjoseph pushed a commit that referenced this pull request Aug 20, 2021
* MCLAG enhacements ICCPd initial code commit
* Resolving the merge conflicts with orighin
* L3 MCLAG Enhancements and Unique IP Changes.
* Addressed review comments

Co-authored-by: Tapash Das <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.