Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On teammgrd/teamsyncd exits, return EXIT_FAILURE #2230

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

judyjoseph
Copy link
Contributor

@judyjoseph judyjoseph commented Apr 15, 2022

What I did
When teammgrd/teamsyncd exits -- return FAILURE so that supervisord catch it and teamd docker is restarted.

Why I did it
Fixes sonic-net/sonic-buildimage#10534

I have seen this in builds from 201911 to master.

How I verified it
Checked by sending SIGTERM to teamsyncd/teammgrd processes


Apr 15 21:57:38.156111 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 21:57:38,155 INFO exited: teamsyncd (exit status 0; expected)

Apr 15 22:20:09.530223 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 22:20:09,529 INFO exited: teammgrd (exit status 0; expected)

-- with fix

Apr 15 22:24:39.752008 str-a7280cr3-2 INFO teamd#supervisord 2022-04-15 22:24:39,751 INFO exited: teamsyncd (exit status 1; not expected)
AND teamd docker restarts


Details if related

@judyjoseph judyjoseph requested review from prsunny and nazariig April 18, 2022 15:24
@nazariig
Copy link
Collaborator

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

@judyjoseph
Copy link
Contributor Author

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

@nazariig this is not pertaining to SIGTERM alone - it is just that I used SIGTERM to validate this fix. For any reason teamsyncd/teammgrd comes out of the SELECT loop and exit, it is good for teamd container to restart. For example if teamsyncd exits siliently, some of the interface events will be missed.

A similar approach of using "exit 1" I see in other orchagent daemons like portsyncd, fpmsyncd etc - so that supervisor sees a not-expected exit and restarts the container.

@judyjoseph
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yozhao101
Copy link

@prsunny Can you please help review this PR please? Since it is related to an ADO: https://msazure.visualstudio.com/One/_workitems/edit/13799016.

@nazariig
Copy link
Collaborator

nazariig commented Jul 6, 2022

@judyjoseph IMHO, this is confusing. SIGTERM is a regular way to stop a process in Linux and the return code should be 0 if no errors observed

@nazariig this is not pertaining to SIGTERM alone - it is just that I used SIGTERM to validate this fix. For any reason teamsyncd/teammgrd comes out of the SELECT loop and exit, it is good for teamd container to restart. For example if teamsyncd exits siliently, some of the interface events will be missed.

A similar approach of using "exit 1" I see in other orchagent daemons like portsyncd, fpmsyncd etc - so that supervisor sees a not-expected exit and restarts the container.

@judyjoseph what is considered to be expected exit here? How are we going to handle graceful shutdown?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The teamd docker not getting restarted when teamsyncd gets SIGTERM signal
4 participants