Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneimage] Fix race condition in systemd container services #421

Merged
merged 1 commit into from
Mar 22, 2017

Conversation

taoyl-ms
Copy link
Contributor

When Type=simple, systemd will consider the service activated immediately after specified in ExecStart process is started. If there is downstream service depending on the state prepared in ExecStart, there will be race condition.

For example, issue #390. In this case, database.service calls database.sh, which calls docker run or docker start -a to start database container. However, systemd considers database.service successfully started at the time database.sh begins, not after docker run finishes. As database.service is consider started, bgp.service can be started. The redis database, which bgp service depends on, might or might not have been started at this time point.

To fix this issue (and still keeping the functionality to monitor docker status with systemd), we split the ExecStart process into an ExecStartPre part and an ExecStart part. docker run is splitted into docker run -d then docker attach, while docker start -a is splitted into docker start and then docker attach. In this way, we make sure the downstream services are blocked until container is successfully started.

@taoyl-ms taoyl-ms requested review from lguohan and stcheng March 22, 2017 03:34
@stcheng
Copy link
Contributor

stcheng commented Mar 22, 2017

thanks for the fix. is there a way to ensure that this fix is working?
and add the above description to the comment.

taoyl-ms added a commit to taoyl-ms/files that referenced this pull request Mar 22, 2017
@taoyl-ms
Copy link
Contributor Author

What do you mean by "ensure the fix is working"? I have tested the fix and the time sequence is correct now:
https://raw.githubusercontent.com/taoyl-ms/files/master/images/systemd_fix.png

@taoyl-ms taoyl-ms merged this pull request into sonic-net:master Mar 22, 2017
stcheng pushed a commit that referenced this pull request Mar 22, 2017
When Type=simple, systemd will consider the service activated immediately
after specified in ExecStart process is started. If there is downstream
service depending on the state prepared in ExecStart, there will be race condition.

For example, issue #390. In this case, database.service calls database.sh, which
calls docker run or docker start -a to start database container. However, systemd
considers database.service successfully started at the time database.sh begins,
not after docker run finishes. As database.service is consider started, bgp.service
can be started. The redis database, which bgp service depends on, might or might not
have been started at this time point.

To fix this issue (and still keeping the functionality to monitor docker status with
systemd), we split the ExecStart process into an ExecStartPre part and an ExecStart
part. docker run is splitted into docker run -d then docker attach , while docker start
-a  is splitted into docker start and then docker attach. In this way, we make sure
the downstream services are blocked until container is successfully started.
yxieca added a commit to yxieca/sonic-buildimage that referenced this pull request Feb 14, 2019
…ules

PR#2538 cannot merge due to master branch status. It has been tested
against 201811 branch.

Submodule src/sonic-sairedis 21f4a49..d57222a:
  > Add more specific logic for ingress ACL and buffer profile (sonic-net#421)
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (sonic-net#418)
  > Add support for vlan tagged frames in virtual switch (sonic-net#417)

Submodule src/sonic-swss 1590030..584490c:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (sonic-net#786)
  > [vstest]: Potential fix for timing issue in warm_reboot's routing UT (sonic-net#788)

Submodule src/sonic-swss-common 594f4e8..286ef34:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (sonic-net#260)

Submodule src/sonic-utilities c6666e2..b44b462:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABL… (sonic-net#458)
  > [aclshow] output only counters per table/rule (sonic-net#442)

Signed-off-by: Ying Xie <[email protected]>

[PR 2538] Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE

Signed-off-by: Jipan Yang <[email protected]>
yxieca added a commit that referenced this pull request Feb 14, 2019
PR#2538 cannot merge due to master branch status. It has been tested
against 201811 branch.

Submodule src/sonic-sairedis 21f4a49..d57222a:
  > Add more specific logic for ingress ACL and buffer profile (#421)
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#418)
  > Add support for vlan tagged frames in virtual switch (#417)

Submodule src/sonic-swss 1590030..584490c:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#786)
  > [vstest]: Potential fix for timing issue in warm_reboot's routing UT (#788)

Submodule src/sonic-swss-common 594f4e8..286ef34:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#260)

Submodule src/sonic-utilities c6666e2..b44b462:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABL… (#458)
  > [aclshow] output only counters per table/rule (#442)

Signed-off-by: Ying Xie <[email protected]>

[PR 2538] Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE

Signed-off-by: Jipan Yang <[email protected]>
lguohan added a commit to yxieca/sonic-buildimage that referenced this pull request Feb 16, 2019
swss

* a6d60f2 2019-02-15 | Create egress ACL table group during the PFCWD stats list installment (sonic-net#787) (HEAD, origin/master, origin/HEAD) [Wenda Ni]
* 52de963 2019-02-15 | [fpmsyncd] Add VNET routes support (sonic-net#772) [Wei Bai]
* d27f49e 2019-02-13 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (sonic-net#786) [Jipan Yang]
* 6363985 2019-02-08 | [vstest]: Potential fix for timing issue in warm_reboot's routing UT (sonic-net#788) [Rodny Molina]
* 6d5424d 2019-02-07 | VNet/Vxlan delete handling (sonic-net#766) [Prince Sunny]
* d680ce2 2019-02-07 | [neighsyncd] increase neighbor syncd restore timeout to 110 seconds (sonic-net#745) [Ying Xie]
* b78cc8d 2019-02-01 | support 8 lanes for a physical port (sonic-net#778) [lguohan]
* 73b620c 2019-02-01 | Increase the watermark polling interval to 10s (sonic-net#777) [Wenda Ni]
* a2b987b 2019-02-01 | [vstest]: fix test_speed.py (sonic-net#780) [lguohan]
* cef4bd0 2019-02-01 | [vstest]: fix test_port_an_warm.py test (sonic-net#779) [lguohan]
* 9f20eda 2019-02-01 | fix a unstable swss egress acl test (sonic-net#776) [Kebo Liu]
* 316ae6c 2019-01-30 | portsorch ports init done flag should means buffer, autoneg, speed, m… (sonic-net#747) [Jipan Yang]
* 4280036 2019-01-30 | [teammgrd] Fix inconsistent port admin status (sonic-net#755) [Jipan Yang]
* cf12bdf 2019-01-30 | Remove AclTableGroup upon removal of port/lag/vlan (sonic-net#751) [Jipan Yang]
* 5779c1a 2019-01-29 | [aclorch] Remove  L4 port range support limitation on egress ACL table and add new SWSS virtual test. (sonic-net#741) [Kebo Liu]
* 36e85eb 2019-01-29 | On a routing vlan, the neighbor entry in the /31 subnet is not added to hardware (sonic-net#771) [Kiran Kumar Kella]
* 882ccc6 2019-01-24 | [vnetorch] Change logic for adding VNet interface (sonic-net#761) [Marian Pritsak]
* f637557 2019-01-25 | [vrfmgrd] Fix VRF is not set to VRF_TABLE in APP_DB correctly (sonic-net#768) [yorke]
* e84a6ab 2019-01-24 | use sai_stat_id_t for new SAI header file (sonic-net#769) [lguohan]

sairedis

* d685e35 2019-02-15 | Add support for fdb_event MOVE and check fdb event oids (sonic-net#420) (HEAD, origin/master, origin/HEAD) [Kamil Cudnik]
* 2b91013 2019-02-15 | [vslib] add missing port attributes for virtual switch (sonic-net#419) [Stepan Blyshchak]
* dcc8688 2019-02-14 | Add more specific logic for ingress ACL and buffer profile (sonic-net#421) [Kamil Cudnik]
* c0b39ea 2019-02-12 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (sonic-net#418) [Jipan Yang]
* ab35dfa 2019-02-11 | Add support for vlan tagged frames in virtual switch (sonic-net#417) [Kamil Cudnik]
* 145ea44 2019-02-02 | [flex counter] handle router interface stats (sonic-net#410) [Mykola F]
* c03d639 2019-02-02 | Add more information on failed map sizes (sonic-net#416) [Kamil Cudnik]
* 29f1e3c 2019-01-31 | Update SAI pointer (sonic-net#414) [Marian Pritsak]
* c0a948d 2019-01-30 | Add WRED specific comparison logic (sonic-net#413) [Kamil Cudnik]
* 1b6a661 2019-01-24 | install SAI extension header files into /usr/include/sai (sonic-net#412) [lguohan]
* 849525a 2019-01-24 | Initialize notification queue pointer before switch create (sonic-net#411) [Kamil Cudnik]
* 02d92f1 2019-01-23 | Add log info for not matching SG/IPG/QUEUES (sonic-net#409) [Kamil Cudnik]
* 8793562 2019-01-18 | Update SAI pointer to latest master (sonic-net#408) [Marian Pritsak]

swss-common

* ec04a5a 2019-02-14 | Add support for WarmStart::setDataCheckState() (sonic-net#242) [Jipan Yang]
* 56bd73f 2019-02-13 | Force only supported commands on consumer table (sonic-net#261) [Kamil Cudnik]
* 414de0f 2019-02-12 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (sonic-net#260) [Jipan Yang]
* 88de725 2019-02-05 | [pyext] enable types in stdint.h (sonic-net#259) [Ying Xie]
* f457ae8 2019-02-05 | Optimized ProducerStateTable set/del notification processing to avoid… (sonic-net#257) [Jipan Yang]
* e5286fd 2019-01-30 | [rif counters] Rif counter schema update (sonic-net#256) [Mykola F]

sonic-utilities

* b44b462 2019-02-14 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABL… (sonic-net#458) (HEAD, origin/master, origin/HEAD) [Jipan Yang]
* e856b8b 2019-02-11 | [aclshow] output only counters per table/rule (sonic-net#442) [Roman Kachur]

Signed-off-by: Guohan Lu <[email protected]>
lguohan pushed a commit that referenced this pull request Feb 16, 2019
…g Broadcom SAI build (#2488)

* [Broadcom SAI] upgrade Broadcom SAI to 3.3.4.3m-3

This is SAI 3.3.4.3m-3 compiled with SAI header file at commit ID
6ad3382217ec22f64cd268faefcbc2ff7caba4fd of SAI repo.

Signed-off-by: Ying Xie <[email protected]>

* change libsaithrift version to 0.9.4

Signed-off-by: Guohan Lu <[email protected]>

* [submodule]: update swss, sairedis, swss-common, sonic-utilities

swss

* a6d60f2 2019-02-15 | Create egress ACL table group during the PFCWD stats list installment (#787) (HEAD, origin/master, origin/HEAD) [Wenda Ni]
* 52de963 2019-02-15 | [fpmsyncd] Add VNET routes support (#772) [Wei Bai]
* d27f49e 2019-02-13 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#786) [Jipan Yang]
* 6363985 2019-02-08 | [vstest]: Potential fix for timing issue in warm_reboot's routing UT (#788) [Rodny Molina]
* 6d5424d 2019-02-07 | VNet/Vxlan delete handling (#766) [Prince Sunny]
* d680ce2 2019-02-07 | [neighsyncd] increase neighbor syncd restore timeout to 110 seconds (#745) [Ying Xie]
* b78cc8d 2019-02-01 | support 8 lanes for a physical port (#778) [lguohan]
* 73b620c 2019-02-01 | Increase the watermark polling interval to 10s (#777) [Wenda Ni]
* a2b987b 2019-02-01 | [vstest]: fix test_speed.py (#780) [lguohan]
* cef4bd0 2019-02-01 | [vstest]: fix test_port_an_warm.py test (#779) [lguohan]
* 9f20eda 2019-02-01 | fix a unstable swss egress acl test (#776) [Kebo Liu]
* 316ae6c 2019-01-30 | portsorch ports init done flag should means buffer, autoneg, speed, m… (#747) [Jipan Yang]
* 4280036 2019-01-30 | [teammgrd] Fix inconsistent port admin status (#755) [Jipan Yang]
* cf12bdf 2019-01-30 | Remove AclTableGroup upon removal of port/lag/vlan (#751) [Jipan Yang]
* 5779c1a 2019-01-29 | [aclorch] Remove  L4 port range support limitation on egress ACL table and add new SWSS virtual test. (#741) [Kebo Liu]
* 36e85eb 2019-01-29 | On a routing vlan, the neighbor entry in the /31 subnet is not added to hardware (#771) [Kiran Kumar Kella]
* 882ccc6 2019-01-24 | [vnetorch] Change logic for adding VNet interface (#761) [Marian Pritsak]
* f637557 2019-01-25 | [vrfmgrd] Fix VRF is not set to VRF_TABLE in APP_DB correctly (#768) [yorke]
* e84a6ab 2019-01-24 | use sai_stat_id_t for new SAI header file (#769) [lguohan]

sairedis

* d685e35 2019-02-15 | Add support for fdb_event MOVE and check fdb event oids (#420) (HEAD, origin/master, origin/HEAD) [Kamil Cudnik]
* 2b91013 2019-02-15 | [vslib] add missing port attributes for virtual switch (#419) [Stepan Blyshchak]
* dcc8688 2019-02-14 | Add more specific logic for ingress ACL and buffer profile (#421) [Kamil Cudnik]
* c0b39ea 2019-02-12 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#418) [Jipan Yang]
* ab35dfa 2019-02-11 | Add support for vlan tagged frames in virtual switch (#417) [Kamil Cudnik]
* 145ea44 2019-02-02 | [flex counter] handle router interface stats (#410) [Mykola F]
* c03d639 2019-02-02 | Add more information on failed map sizes (#416) [Kamil Cudnik]
* 29f1e3c 2019-01-31 | Update SAI pointer (#414) [Marian Pritsak]
* c0a948d 2019-01-30 | Add WRED specific comparison logic (#413) [Kamil Cudnik]
* 1b6a661 2019-01-24 | install SAI extension header files into /usr/include/sai (#412) [lguohan]
* 849525a 2019-01-24 | Initialize notification queue pointer before switch create (#411) [Kamil Cudnik]
* 02d92f1 2019-01-23 | Add log info for not matching SG/IPG/QUEUES (#409) [Kamil Cudnik]
* 8793562 2019-01-18 | Update SAI pointer to latest master (#408) [Marian Pritsak]

swss-common

* ec04a5a 2019-02-14 | Add support for WarmStart::setDataCheckState() (#242) [Jipan Yang]
* 56bd73f 2019-02-13 | Force only supported commands on consumer table (#261) [Kamil Cudnik]
* 414de0f 2019-02-12 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#260) [Jipan Yang]
* 88de725 2019-02-05 | [pyext] enable types in stdint.h (#259) [Ying Xie]
* f457ae8 2019-02-05 | Optimized ProducerStateTable set/del notification processing to avoid… (#257) [Jipan Yang]
* e5286fd 2019-01-30 | [rif counters] Rif counter schema update (#256) [Mykola F]

sonic-utilities

* b44b462 2019-02-14 | Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABL… (#458) (HEAD, origin/master, origin/HEAD) [Jipan Yang]
* e856b8b 2019-02-11 | [aclshow] output only counters per table/rule (#442) [Roman Kachur]

Signed-off-by: Guohan Lu <[email protected]>

* [mlnx] update mellanox sai

Signed-off-by: Stepan Blyschak <[email protected]>
dmytroxshevchuk pushed a commit to dmytroxshevchuk/sonic-buildimage that referenced this pull request Aug 31, 2020
…#421)

* Add more specific logic for ingress ACL and buffer profile

* Address comments
tahmed-dev added a commit to tahmed-dev/sonic-buildimage that referenced this pull request Feb 18, 2021
Change in this update:
    b75aab7 [swss-common] Add LINKMGR CFG and MUX LINKMGR state table names (sonic-net#421)
    4a77d1c [ci]: add vstest (sonic-net#459)
    07258a6 [ci]: use build template (sonic-net#457)
    ddcae3e runRedisScript api to process integer returned by script run in the redis (sonic-net#447)
    33d89c7 [systemlag] Schema defs for system lag (sonic-net#448)
    af01f37 spell check fixes (sonic-net#456)
    7afd43d Update to make getNamespaces() API at par with the get_ns_list() swssdk-py API. (sonic-net#455)

signed-off-by: Tamer Ahmed <[email protected]>
tahmed-dev added a commit that referenced this pull request Feb 18, 2021
Change in this update:
    b75aab7 [swss-common] Add LINKMGR CFG and MUX LINKMGR state table names (#421)
    4a77d1c [ci]: add vstest (#459)
    07258a6 [ci]: use build template (#457)
    ddcae3e runRedisScript api to process integer returned by script run in the redis (#447)
    33d89c7 [systemlag] Schema defs for system lag (#448)
    af01f37 spell check fixes (#456)
    7afd43d Update to make getNamespaces() API at par with the get_ns_list() swssdk-py API. (#455)

signed-off-by: Tamer Ahmed <[email protected]>
daall pushed a commit that referenced this pull request Feb 25, 2021
Change in this update:
    b75aab7 [swss-common] Add LINKMGR CFG and MUX LINKMGR state table names (#421)
    4a77d1c [ci]: add vstest (#459)
    07258a6 [ci]: use build template (#457)
    ddcae3e runRedisScript api to process integer returned by script run in the redis (#447)
    33d89c7 [systemlag] Schema defs for system lag (#448)
    af01f37 spell check fixes (#456)
    7afd43d Update to make getNamespaces() API at par with the get_ns_list() swssdk-py API. (#455)

signed-off-by: Tamer Ahmed <[email protected]>
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
Change in this update:
    b75aab7 [swss-common] Add LINKMGR CFG and MUX LINKMGR state table names (sonic-net#421)
    4a77d1c [ci]: add vstest (sonic-net#459)
    07258a6 [ci]: use build template (sonic-net#457)
    ddcae3e runRedisScript api to process integer returned by script run in the redis (sonic-net#447)
    33d89c7 [systemlag] Schema defs for system lag (sonic-net#448)
    af01f37 spell check fixes (sonic-net#456)
    7afd43d Update to make getNamespaces() API at par with the get_ns_list() swssdk-py API. (sonic-net#455)

signed-off-by: Tamer Ahmed <[email protected]>
mssonicbld added a commit that referenced this pull request Aug 29, 2024
…tomatically (#20068)

#### Why I did it
src/sonic-linux-kernel
```
* 2cfa620 - (HEAD -> 202311, origin/202311) tg3: fix broadcom NIC 57766 staying down issue (#421) (#428) (7 hours ago) [Vasundhara Volam]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Sep 1, 2024
…tomatically (#20071)

#### Why I did it
src/sonic-linux-kernel
```
* db34fda - (HEAD -> 202405, origin/202405) tg3: fix broadcom NIC 57766 staying down issue (#421) (4 days ago) [byu343]
```
#### How I did it
#### How to verify it
#### Description for the changelog
DavidZagury pushed a commit to DavidZagury/sonic-buildimage that referenced this pull request Dec 7, 2024
* Add kernel patch to fix 57766 staying down after reset

This is the workaround for incorrected detected DMA overflow that
may result in NIC staying down after reset. The fix is to limit
address space that can be used.

* Add description and fix bookworm build

* Fix subject in the patch

---------

Co-authored-by: Saikrishna Arcot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants