On teammgrd/teamsyncd exits, return EXIT_FAILURE #2230

judyjoseph · 2022-04-15T22:27:15Z

What I did
When teammgrd/teamsyncd exits -- return FAILURE so that supervisord catch it and teamd docker is restarted.

Why I did it
Fixes sonic-net/sonic-buildimage#10534

I have seen this in builds from 201911 to master.

How I verified it
Checked by sending SIGTERM to teamsyncd/teammgrd processes


Apr 15 21:57:38.156111 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 21:57:38,155 INFO exited: teamsyncd (exit status 0; expected)

Apr 15 22:20:09.530223 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 22:20:09,529 INFO exited: teammgrd (exit status 0; expected)

-- with fix

Apr 15 22:24:39.752008 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 22:24:39,751 INFO exited: teamsyncd (exit status 1; not expected)
AND teamd docker restarts

Details if related

…atch it and tesmd docker is restarted.

nazariig · 2022-04-18T16:24:09Z

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

judyjoseph · 2022-04-27T20:13:59Z

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

@nazariig this is not pertaining to SIGTERM alone - it is just that I used SIGTERM to validate this fix. For any reason teamsyncd/teammgrd comes out of the SELECT loop and exit, it is good for teamd container to restart. For example if teamsyncd exits siliently, some of the interface events will be missed.

A similar approach of using "exit 1" I see in other orchagent daemons like portsyncd, fpmsyncd etc - so that supervisor sees a not-expected exit and restarts the container.

judyjoseph · 2022-04-27T20:14:21Z

/azp run

azure-pipelines · 2022-04-27T20:14:30Z

Azure Pipelines successfully started running 1 pipeline(s).

yozhao101 · 2022-05-25T07:13:26Z

@prsunny Can you please help review this PR please? Since it is related to an ADO: https://msazure.visualstudio.com/One/_workitems/edit/13799016.

nazariig · 2022-07-06T09:13:13Z

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

@nazariig this is not pertaining to SIGTERM alone - it is just that I used SIGTERM to validate this fix. For any reason teamsyncd/teammgrd comes out of the SELECT loop and exit, it is good for teamd container to restart. For example if teamsyncd exits siliently, some of the interface events will be missed.

A similar approach of using "exit 1" I see in other orchagent daemons like portsyncd, fpmsyncd etc - so that supervisor sees a not-expected exit and restarts the container.

@judyjoseph what is considered to be expected exit here? How are we going to handle graceful shutdown?

When teammgrd/teamsyncd exits -- return FAILURE so that supervisord c…

f5d152c

…atch it and tesmd docker is restarted.

judyjoseph requested review from prsunny and nazariig April 18, 2022 15:24

qiluo-msft approved these changes Jul 6, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On teammgrd/teamsyncd exits, return EXIT_FAILURE #2230

On teammgrd/teamsyncd exits, return EXIT_FAILURE #2230

judyjoseph commented Apr 15, 2022 •

edited

Loading

nazariig commented Apr 18, 2022

judyjoseph commented Apr 27, 2022

judyjoseph commented Apr 27, 2022

azure-pipelines bot commented Apr 27, 2022

yozhao101 commented May 25, 2022

nazariig commented Jul 6, 2022

On teammgrd/teamsyncd exits, return EXIT_FAILURE #2230

Are you sure you want to change the base?

On teammgrd/teamsyncd exits, return EXIT_FAILURE #2230

Conversation

judyjoseph commented Apr 15, 2022 • edited Loading

nazariig commented Apr 18, 2022

judyjoseph commented Apr 27, 2022

judyjoseph commented Apr 27, 2022

azure-pipelines bot commented Apr 27, 2022

yozhao101 commented May 25, 2022

nazariig commented Jul 6, 2022

judyjoseph commented Apr 15, 2022 •

edited

Loading