Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monit] Use the string "/usr/bin/syncd\s" to monitor the syncd process #4706

Merged
merged 2 commits into from
Jun 26, 2020
Merged

[Monit] Use the string "/usr/bin/syncd\s" to monitor the syncd process #4706

merged 2 commits into from
Jun 26, 2020

Conversation

yozhao101
Copy link
Contributor

- Why I did it
After discussed with Joe, we use the string "/usr/bin/syncd\s" in Monit configuration file to monitor
syncd process on Broadcom and Mellanox. Due to my careless, I did not find this bug during the
previous testing. If we use the string "/usr/bin/syncd" in Monit configuration file to monitor the
syncd process, Monit will not detect whether syncd process is running or not.

If we ran the command sudo monit procmactch “/usr/bin/syncd” on Broadcom, there will be three
processes in syncd container which matched this "/usr/bin/syncd": /bin/bash /usr/bin/syncd.sh wait, /usr/bin/dsserve /usr/bin/syncd –diag -u -p /etc/sai.d/sai.profile and /usr/bin/syncd –diag - u -p /etc/sai.d/said.profile. Monit will select the processes with the highest uptime (at there
/bin/bash /usr/bin/syncd.sh wait) to match and did not select /usr/bin/syncd –diag -u -p /etc/sai.d/said.profile to match.

Similarly, On Mellanox Monit will also select the process with the highest uptime (at there
/bin/bash /usr/bin/syncd.sh wait) to match and did not select /usr/bin/syncd –diag -u -p /etc/sai.d/said.profile to match.

That is why Monit is unable to detect whether syncd process is running or not if we use the string “/usr/bin/syncd” in Monit configuration file. If we use the string "/usr/bin/syncd\s" in Monit configuration file, Monit can filter out the process /bin/bash /usr/bin/syncd.sh wait and thus can correctly monitor the syncd process.

- How I did it

- How to verify it

Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR takes care of Broadcom and Mellanox syncd containers. What about all the other syncd containers? Can you determine if this applies to them also, and if so, make the same change?

@lguohan
Copy link
Collaborator

lguohan commented Jun 9, 2020

retest mellanox please

@yozhao101
Copy link
Contributor Author

This PR takes care of Broadcom and Mellanox syncd containers. What about all the other syncd containers? Can you determine if this applies to them also, and if so, make the same change?

Updated.

@yozhao101 yozhao101 merged commit b8ad0ed into sonic-net:master Jun 26, 2020
abdosi pushed a commit that referenced this pull request Jun 28, 2020
#4706)

**- Why I did it**
After discussed with Joe, we use the string "/usr/bin/syncd\s" in Monit configuration file to monitor 
syncd process on Broadcom and Mellanox. Due to my careless, I did not find this bug during the 
previous testing. If we use the string "/usr/bin/syncd" in Monit configuration file to monitor the 
syncd process, Monit will not detect whether syncd process is running or not. 

If we ran the command  `sudo monit procmactch “/usr/bin/syncd”` on Broadcom, there will be three 
processes in syncd container which matched this "/usr/bin/syncd": `/bin/bash /usr/bin/syncd.sh
wait`, `/usr/bin/dsserve /usr/bin/syncd –diag -u -p /etc/sai.d/sai.profile` and `/usr/bin/syncd –diag -
u -p /etc/sai.d/said.profile`. Monit will select the processes with the highest uptime (at there 
`/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p
/etc/sai.d/said.profile` to match. 

Similarly, On Mellanox Monit will also select the process with the highest uptime (at there 
`/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p
/etc/sai.d/said.profile` to match.

That is why Monit is unable to detect whether syncd process is running or not if we use the string “/usr/bin/syncd” in Monit configuration file. If we use the string "/usr/bin/syncd\s" in Monit configuration file, Monit can filter out the process `/bin/bash /usr/bin/syncd.sh wait` and thus can correctly monitor the syncd process.

**- How I did it**

**- How to verify it**

Signed-off-by: Yong Zhao <[email protected]>
pjaipakdee19 pushed a commit to pjaipakdee19/sonic-buildimage that referenced this pull request Jul 7, 2020
sonic-net#4706)

**- Why I did it**
After discussed with Joe, we use the string "/usr/bin/syncd\s" in Monit configuration file to monitor 
syncd process on Broadcom and Mellanox. Due to my careless, I did not find this bug during the 
previous testing. If we use the string "/usr/bin/syncd" in Monit configuration file to monitor the 
syncd process, Monit will not detect whether syncd process is running or not. 

If we ran the command  `sudo monit procmactch “/usr/bin/syncd”` on Broadcom, there will be three 
processes in syncd container which matched this "/usr/bin/syncd": `/bin/bash /usr/bin/syncd.sh
wait`, `/usr/bin/dsserve /usr/bin/syncd –diag -u -p /etc/sai.d/sai.profile` and `/usr/bin/syncd –diag -
u -p /etc/sai.d/said.profile`. Monit will select the processes with the highest uptime (at there 
`/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p
/etc/sai.d/said.profile` to match. 

Similarly, On Mellanox Monit will also select the process with the highest uptime (at there 
`/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p
/etc/sai.d/said.profile` to match.

That is why Monit is unable to detect whether syncd process is running or not if we use the string “/usr/bin/syncd” in Monit configuration file. If we use the string "/usr/bin/syncd\s" in Monit configuration file, Monit can filter out the process `/bin/bash /usr/bin/syncd.sh wait` and thus can correctly monitor the syncd process.

**- How I did it**

**- How to verify it**

Signed-off-by: Yong Zhao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants