Skip to content

Latest commit

 

History

History
636 lines (480 loc) · 31.3 KB

pmon-chassis-design.md

File metadata and controls

636 lines (480 loc) · 31.3 KB

SONiC Chassis Platform Management & Monitoring

Rev 1.0

Table of Contents

Revision

Rev Date Author Change Description
1.0 Manjunath Prabhu, Sureshkannan Duraisamy, Marty Lok, Marc Snider Initial version

About this Manual

This document provides design requirements and interactions between platform drivers and PMON for SONiC on VOQ chassis with linecard CPU's.

Scope

For first phase of design, this document covers high level design of the platform support and its interactions with PMON in a VOQ chassis environment. Operations like firmware upgrade will be added at later stage of the development. This document assumes all linecards and control cards(aka supervisor) has a CPU complex where SONiC would be running. It only considers Fabric cards without CPU or SONiC isn't running on Fabric CPU even if its available. Also, this document assumes linecards cant be powered on without control card operationally up and running.

Acronyms

PSU - Power Supply Unit

SFM - Switch Fabric Module

Platform Stack - Set of Processes, Daemons, Dockers implementing functional requirements of a platform & its perpherals. (i.e PMON docker, Thermalctld, Database docker, etc)

Management Stack - Set of Processes, Daemons, Dockers implementing management interface of chassis, linecard and per-asic. (i.e CLI, SNMP, DHCP, etc).

Control Plane Stack - Set of Processes, Daemons, Dockers implementing control plane protocols suchs as BGP, EVPN and also providing complete APP/ASIC DB Orchestration(OrchAgent).

Datapath Stack - Set of Processes, Daemons, Dockers, API's implementing datapath ASIC hardware programming via SAI interface.

1. Modular VOQ Chassis - Reference

The below picture shows reference of VOQ chassis highlevel hardware architecture. Chassis has 1 or 2 control cards (aka supervisor cards), 1 or more linecards and 1 or more switch fabric cards. It also has 1 or more FAN tray, 1 or more PSUs and midplane ethernet. In general, control cards manages the perpherals like fan, psu, midplane ethernet, etc.

Modular VOQ Chassis

As an example, Nokia modular VQO chassis is IXR-7250 which has control card (i.e CPMv1, CPMv2) and linecards(i.e imm36-400g-qsfpdd, imm36-32x100g-4x400g-qsfpdd, etc), Fabric cards (i.e SFMv1, SFMv2)

2. SONiC Platform Management & Monitoring

2.1. Functional Requirements

At a functional level of a chssis, SONiC will manage control cards, line cards and all other peripheral devices of the chassis as required by chassis platform vendor specification. Below requirements capture some of the key areas that is required to operate a VOQ chassis.

  • Chassis control cards & line cards should be able to boot using ONIE or any vendor specific boot method.
  • Linecards should be managed via control card to support operations like power up/down, get operational status.
  • In a typical chassis, control card manages the fan speed based on various temperature sensors readings from linecards and a chassis.
  • Control card monitors PSU's of the chassis.
  • LED's and Transceiver's are present on linecards and can be managed via linecard's SONiC platform instances.
  • Some of these perpherals are plug-able and hot-swap capable.
  • In general, VOQ chassis has midplane ethernet which interconnects linecards and control cards together for its internal communication. This should be initalized upon platform booting and can be used as IP connectivity between control cards, linecards.
  • Each linecard will have management interface either directly to external management interface or via internal midplane ethernet.

2.2. Chassis Platform Stack

In a modular disaggregated SONiC software architecture, each linecard will run an instance of SONiC platform stack and control card would run its own instance of SONiC platform stack. Each linecard resources are manged as independent fixed platform and also providing all the above functional requirements to operate the chassis. Below picture describes high level view of the platform stack.

Chassis Platform Stack

  • Each linecard & control card will have its own ONIE_PLATFORM string to differentiate between each other and also variation of it.
  • Control card wont run any protocol stack except SWSS, SyncD dockers for managing Switch Fabric.
  • Each linecard & control card would run one instance of PMON container.
  • Each linecard & control card would run one instance of redis server for common platform monitoring (host network) and also uses per-asic redis for SFP monitoring.
  • Control card & linecard could communicate over midplane ethernet. In order to provide this IP connectivty between control & line card, midplane ethernet drivers are run on host network namespace.
  • Each linecard & control card gets IP address (internal network) assigned to this midplane ethernet based on slot information.
  • Control card PMON will have all sensors readings via either fetching linecard redis servers(subscribe to multiple servers) or global redis db on control card(publish to single server).
  • SONiC on a fixed platform has put together PMON 2.0 API's for platform vendors to implement peripheral drivers (kernel or userspace). Most of the existing PMON 2.0 API's will be used for chassis and some key changes and enhancments required as detailed below.
  • Control card will provide driver implementation to obtain linecard status such as present, empty.

3. Detailed Workflow

3.1 Chassis Boot Process

SONiC supports ONIE as a boot method and also provides vendor specific boot method. In either boot method, control card of chassis will be booted first and followed by linecard. For first phase of design, it assumes that control card should be operationally ready before linecards to boot. This is important because some of the sensors and fan settings are managed in a control card and it has to set with correct values when linecards are running to make chassis healthy and avoid over heating.

3.1.1 Control Card Boot Process

Control card can be booted using ONiE method. Upon boot, unique ONIE_PLATFORM string will be provided in a ONIE firmware to differentiate the cards and services/dockers it could start via systemd-generator. In case of control card, there wont be dockers like BGP, LLDP, etc started. This service list is included as part of platform specific service list file.

    device/
|-- <VENDOR_NAME>/
|   |-- <ONIE_PLATFORM_STRING>/
|   |   |-- <HARDWARE_SKU>/
|   |   |   |-- port_config.ini
|   |   |   |-- sai.profile
|   |   |   |-- xxx.config.bcm
|   |   |-- default_sku
|   |   |-- fancontrol
|   |   |-- installer.conf
|   |   |-- platform_env.conf
|   |   |-- led_proc_init.soc
|   |   |-- platform_reboot
|   |   |-- pmon_daemon_control.json
|   |   |-- sensors.conf
|   |   |-- asic.conf 
|   |   |-- services.conf [NEWFILE]
sonic-buildimage/device/nokia/x86_64-nokia_ixr7250_36x400g-r0$ cat asic.conf
NUM_ASIC=1
HW_TYPE=IOM
sonic-buildimage/device/nokia/x86_64-nokia_ixr7250_36x400g-r0$

3.1.2 Line Card Boot Process

Linecard boot process is very similar to control card and main difference is services that is started on linecard will include protocol dockers such BGP, LLDP, etc. Also, SyncD docker will started for VOQ ASIC instead of SF ASIC.

3.2 Chassis Platform Management

3.2.1 Midplane Ethernet

In a typical modular modern chassis includes a midplane ethernet to interconnect control card & line cards. This is new component (peripheral?) needs to be added to SONiC. This document proposes midplane ethernet as platform perpherical management and captures the design as follow.

  • Upon linecard or control card booted and part of its initilziation, midplane ethernet gets initialized.
  • Slot number is generally used in assigning an IP address to these interfaces.

In order to allow direct access to linecards from outside of the chassis over external management network, chassis midplane ethernet network and external management networks needs to be connected to each other. There are couple of options to consider.

  1. Control-card can create virtual switch (linux bridge) and all add midplane ethernet and external management interface on this bridge. This is the L2 mode of operation but internal communication and external L2 stations traffic will be seen inside this midplane ethernet.
  2. IP Routing: Midplane ethernet could be configured with externally reachable network (announced via any routing protocols), this requires mgmt interface on control to run routing protocol which isn't common deployment.
  3. Statically assigned externally reachable management IP address per lincard via chassis-d and use NAT to map external/internal midplane IP address. In this case, internal midplane ethernet traffic wont be seen in a external management network and only direct communication allowed using NAT rules.

Allowing DHCP relay or DHCP client on these internal midplane ethernet aren't considered for first phase of the design.

3.2.2 Chassis Monitoring & ChassisD

Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. The different types of cards have to be managed and monitored.

Functional Requirements

  • Identify a central entity that has visibility of the entire chassis.
  • Monitor the status of the line-card, fabric-card etc using new PMON 2.0 APIs. The assumption is that each vendor will have platform-drivers or implementation to detect the status of the cards in the chassis.
  • The status will need to be persisted in REDIS-DB.
  • PMON processes can subscribe to UP/DOWN events of these cards.

Schema

The schema for CHASSIS_CARD_INFO table in State DB is:

key                                   = CHASSIS_CARD <card index> |"state_db"      ; 
; field                               = value
name                                  = STRING                          ; name of the card
slot                                  = 1*2DIGIT                        ; slot number in the chassis
status                                = "Empty" | "Online" | "Offline"  ; status of the card
type                                  = "control"| "line" | "fabric"    ; card-type

Prototype Code

The line-card status update will happen in the main monitoring loop.


In src/sonic-platform-daemons/sonic-chassisd/scripts/chassid:

class DaemonChassisd(DaemonBase):

    # Connect to STATE_DB and create linecard/chassis info tables
    state_db = daemon_base.db_connect(swsscommon.STATE_DB)
    linecard_tbl = swsscommon.Table(state_db, LINECARD_INFO_TABLE)

    # Start main loop
    logger.log_info("Start daemon main loop")

    while not self.stop.wait(LINECARD_INFO_UPDATE_PERIOD_SECS):
        linecard_db_update(linecard_tbl, num_linecard)

    logger.log_info("Stop daemon main loop")

A LineCardBase class is introduced for chassis vendors to implement their representation of line-cards in a chassis.

In src/sonic-platform-common/sonic_platform_base/linecard_base.py

class LineCardBase(object):
    """
    Abstract base class for implementing a platform specific class to
    represent a control-card, line-card or fabric-card of a chassis
    """
    _linecard_list = None
    
    def __init__(self):
        self._linecard_list = []
        
    def get_name(self):
    
    def get_description(self):
    
    def get_slot(self):

    def get_status(self):
    
    def reboot_slot(self):
    
    def set_admin_state(self, state): # enable or disable

In src/sonic-platform-common/sonic_platform_base/chassis_base.py

class ChassisBase(device_base.DeviceBase):
    def get_num_linecards(self):
    
    def get_all_linecards(self):
    
    def get_linecard_presence(self, lc_index):
    

An example vendor implementation would be as follows:

In platform/broadcom/<vendor>/sonic_platform/linecard.py

from sonic_platform_base.linecard_base import LineCardBase

class LineCard(LineCardBase):
    def __init__(self, linecard_index):

Show command

The show platform command is enhanced to show chassis information

show platform details

PLATFORM INFO TABLE
-----------------------------------------------------------
| Slot   | Name                              | Status     |
-----------------------------------------------------------
| 16     | cpm2-ixr                          | Online     |
| 1      | imm36-400g-qsfpdd                 | Online     |
| 2      | imm36-400g-qsfpdd                 | Online     |
| 3      | imm36-400g-qsfpdd                 | Online     |
| 4      | Empty                             | Empty      |
| 17     | SFM1                              | Offline    |
| 18     | SFM2                              | Offline    |
| 19     | SFM3                              | Offline    |
| 20     | SFM4                              | Offline    |
| 21     | SFM5                              | Online     |
| 22     | SFM6                              | Offline    |
-----------------------------------------------------------

3.2.3 Chassis Local Sonic Image Hosting Service

In some environments, the control-card and the linecard may not necessarily have reachability to external networks. Linecards without external USB slot, could use control-card as image server to download the SONiC image assuming control-cards will have external USB storage or internal storage hosting the images. We propose that chassisd on the control-card can be a place-holder for the sonic bootable images and run an http-server for image download by the line-cards.

3.2.4 Disaggregated vs Global DB

In a chassis environment, processes monitoring peripherals will need to have a view of the components across multiple cards. The requirement would be aggregate the data on the control-card. There are 2 options:

  • Disaggregated DB - Each card updates to local REDIS-DB. Monitoring process will pull or subscribe to the table updates of each card.
  • Global DB - Each card will updated their state to a line-card-table in the Global-DB

Chassis Distributed DB Update

3.3 Peripheral Management

Processes running ina the PMON container would differ based on the HWSKU. In the chassis, the Control-card and Line-cards would be running a subset of the PMON processes. Existing control file /device/<vendor>/<platform>/<hwsku>/pmon_daemon_control.json is used to start processes ina each of the cards. A new template /dockers/docker-platform-monitor/critical_processes.j2 is introduced to dynamically generate the critical_processes instead of current statically defined list.

3.3.1 PSUd

PSUd in PMON will monitor the PSUs and maintain state in REDIS-DB. On a chassis, the PSUs are fully managed by the control-card. Currently, platform exposes APIs for PSUd to periodically query the PSU status/presence.

Functional Requirement

One of the functional requirement for chassis is to manage and monitor the power available vs power required. The total number of PSUs required is a function of number of line-card, SFMs and FANs.

Proposal

  • PSUd will get the power-capacity of each PSU.
  • PSUd will calculate the total power capacity from power-capacity of each PSU multiplied by number of PSUs with valid status.
  • PSUd will get fixed maximum power requirements of each type of line-card, each SFM and each FAN.
  • PSUd will calculate the total power required as a sum total of power-required of each type of card multipled by maxium power requirement of that card.
  • PSUd will set a Master-LED state based on power available vs power required.

We do not see a requirement for real-time monitoring of current power usage of each card.

PSU

show platform psustatus

admin@sonic:~$ show platform psustatus
PSU    Status
-----  -----------
PSU 1  OK
PSU 2  OK
PSU 3  OK
PSU 4  NOT PRESENT
PSU 5  NOT PRESENT
PSU 6  NOT PRESENT

3.3.2 Thermalctld

Thermalctld is monitoring temperatures, monitoring fan speed and allowing policies to control the fan speed.

Functional Requirement

  1. There are multiple temperature sensors that need to be monitored. All these need to be available on the control-card.
    • Temperature sensors are on the control-card
    • Temperature sensors are on the line-card
    • Temperature sensors are on the SFMs.
  2. The FAN control is limited to the control-card

Tempearature and Fan Control

Proposal

  1. Chassisd notified line-card up/down events are subscribed up Thermalctld.
  2. All local temperatures sensors are recorded on both control and line-cards for monitoring. The control-card monitors temperature sensors of SFMs.
  3. Chassisd on control-card will periodically fetch the summary-info from each of the line-cards. Alternately, the thermalctld on control-card can subscribe for the line-card sensors updates.
  4. The local-temperatures of control-card, line-cards and fabric-cards are passed onto the Fan-Control algorithm.
  5. The fan-control algorithm can be implemented ina PMON or ina the platform-driver.

Changes ina thermalctld is to have a TemperatureUpdater class for each line-card. Each of the updater class will fetch the values for all temperature senosors of the line-card from the REDIS-DB of the line-card.

In src/sonic-platform-daemons/sonic-thermalctld/scripts/thermalctld:

class TemperatureUpdater():
    def updater_per_slot(slot):
        # Connect to State-DB of given slot
        # Record all thermal sensor values
        self.chassis._linecard_list[index].set_thermal_info()

class ThermalMonitor(ProcessTaskBase):
    def __init__:
        if platform_chassis.get_controlcard_slot() == platform_chassis.get_my_slot():
            for card ina platform_chassis.get_all_linecards():
                slot = card.get_slot()
                self.temperature_updater[slot] = TemperatureUpdater(chassis, slot)
        else
            slot = card.get_my_slot()
            self.temperature_updater = TemperatureUpdater(chassis, slot)
            
    def task_worker(self):
        while not self.task_stopping_event(wait_time):
            #Only on conntrol card
            if platform_chassis.get_controlcard_slot() == platform_chassis.get_my_slot():
                for updater ina self.temperature_updater:
                    updater.update_per_slot(slot)
            else:
                self.temperature_updater.update()

The thermal_infos.py and thermal_actions.py will continue to be vendor specific. In the collect() function, the vendor will have information to all the sensors of the chassis.

In platform/broadcom/<vendor>/sonic_platform/thermal_infos.py

class ThermalInfo(ThermalPolicyInfoBase):
    def collect(self, chassis):
        #Vendor specific calculation from all available sensor values on chassis

In approach-1, the thermal_policy.json can provide additional action to check if line-card temperature exceeded the threshold etc. The thermalctld.run_policy() will match the required condition and take the appropriate action to set fan speed.

In approach-2, the sensors information could be passed on the platform-driver which can then control the fan speed.

show platform fan

admin@sonic:~$ show platform fan
     FAN    Speed              Direction    Presence    Status          Timestamp
--------  -------  ---------------------  ----------  --------  -----------------
FanTray1      50%  FAN_DIRECTION_EXHAUST     Present        OK  20200429 06:11:16
FanTray2      50%  FAN_DIRECTION_EXHAUST     Present        OK  20200429 06:11:17
FanTray3      50%  FAN_DIRECTION_EXHAUST     Present        OK  20200429 06:11:18

show platform temperature

admin@sonic:~$ show platform temperature
   Sensor    Temperature    High TH    Low TH    Crit High TH    Crit Low TH    Warning          Timestamp
---------   -------------  ---------  --------  --------------  -------------  ---------  -----------------
Thermal 0        28           50         0             N/A            N/A        False     20200529 01:49:39
Thermal 1        37           50         0             N/A            N/A        False     20200529 01:49:39
Thermal 2        40           68         0             N/A            N/A        False     20200529 01:49:39
Thermal 3        45           68         0             N/A            N/A        False     20200529 01:49:39
Thermal 4        32           68         0             N/A            N/A        False     20200529 01:49:39
Thermal 5        59           68         0             N/A            N/A        False     20200529 01:49:39

3.3.3 Xcvrd/SFP

Requirements

  • Database connections per namespace - Database dockers run per namespace and PMON processes need to connect to each of these database instances.
  • Update per namespace port status - The pmon processes will need to run per-asic specific functinality ina a separate thread.

Database Connections

Below is a code snippet to introduce a new API db_unix_connect

In src/sonic-daemon-base/sonic_daemon_base/daemon_base.py

def db_unix_connect(db, namespace):
    from swsscommon import swsscommon
    return swsscommon.DBConnector(db,
                                  REDIS_UNIX_SOCKET_PATH+str(namespace)+REDIS_UNIX_SOCKET_INFO,
                                  REDIS_TIMEOUT_MSECS)

Below is a code snippet to connect to State-DB.

In src/sonic-platform-daemons/sonic-xcvrd/scripts/xcvrd

 use_unix_sockets = False
 
 # Check if environment is multi-asic
 if check_multiasic():
    use_unix_sockets = True
    
 # Connect to STATE_DB and create transceiver dom/sfp info tables
 if not use_unix_sockets:
    state_db = daemon_base.db_connect(swsscommon.STATE_DB)
 else:
    state_db = daemon_base.db_unix_connect(swsscommon.STATE_DB, namespace)

Multi-thread support

Below is a code snippet to run namespace specific funtionality per thread.

In src/sonic-platform-daemons/sonic-xcvrd/scripts/xcvrd

  # Run daemon
    def run(self):
        logger.log_info("Starting up...")
        
        # Start daemon initialization sequence
        self.init()
        
        if num_asics == 1:
            use_unix_sockets = False
            self.run_per_asic(0)
        else:
            self.xcvrd_thread_list = []
            for i ina range(0, num_asics): #Number of Asics per pmon
                thread = threading.Thread(target=self.run_per_asic, args=(i,))
                thread.setName('Xcvrd Thread '+str(i))
                self.xcvrd_thread_list.append(thread)
                
            for thread ina self.xcvrd_thread_list:
                thread.start()
                
            for thread ina self.xcvrd_thread_list:
                thread.join()
                
        # Start daemon deinitialization sequence
        self.deinit()


Additonal new APIs like set_namespace() and get_namespace() can be provided ina chassis_base.py which can be set by PMON processes. This will enable modules supporting platform 2.0 to be aware or query which namespace they are running ina.

Callflow

Tranceiver Monitoring

Show commands

show interfaces status

admin@sonic:~$ sudo ip netns exec asic0 show interfaces status
 Interface                    Lanes    Speed    MTU         Alias    Vlan    Oper    Admin                                             Type    Asym PFC
-----------  -----------------------  -------  -----  ------------  ------  ------  -------  -----------------------------------------------  ----------
 Ethernet1    8,9,10,11,12,13,14,15     400G   9100   Ethernet1/1  routed    down     down                                              N/A         N/A
 Ethernet2          0,1,2,3,4,5,6,7     400G   9100   Ethernet1/2  routed    down     down                                              N/A         N/A
 Ethernet3  24,25,26,27,28,29,30,31     400G   9100   Ethernet1/3  routed      up       up  QSFP-DD Double Density 8X Pluggable Transceiver         N/A
 Ethernet4  16,17,18,19,20,21,22,23     400G   9100   Ethernet1/4  routed    down     down                                              N/A         N/A
 Ethernet5  40,41,42,43,44,45,46,47     400G   9100   Ethernet1/5  routed    down     down                                              N/A         N/A
 Ethernet6  32,33,34,35,36,37,38,39     400G   9100   Ethernet1/6  routed    down     down                                              N/A         N/A
 Ethernet7  80,81,82,83,84,85,86,87     400G   9100   Ethernet1/7  routed    down     down  QSFP-DD Double Density 8X Pluggable Transceiver         N/A
 Ethernet8  88,89,90,91,92,93,94,95     400G   9100   Ethernet1/8  routed    down     down                                              N/A         N/A
 Ethernet9  64,65,66,67,68,69,70,71     400G   9100   Ethernet1/9  routed    down     down                                              N/A         N/A
Ethernet10  72,73,74,75,76,77,78,79     400G   9100  Ethernet1/10  routed    down     down                                              N/A         N/A
Ethernet11  48,49,50,51,52,53,54,55     400G   9100  Ethernet1/11  routed    down     down                                              N/A         N/A
Ethernet12  56,57,58,59,60,61,62,63     400G   9100  Ethernet1/12  routed    down     down                                              N/A         N/A

show interfaces transceiver presence

admin@sonic:~$ sudo ip netns exec asic0 show interfaces transceiver presence
Port        Presence
----------  -----------
Ethernet1   Not present
Ethernet2   Not present
Ethernet3   Present
Ethernet4   Not present
Ethernet5   Not present
Ethernet6   Not present
Ethernet7   Present
Ethernet8   Not present
Ethernet9   Not present
Ethernet10  Not present
Ethernet11  Not present
Ethernet12  Not present

3.3.4 LEDd

Functional Requirements

The requirements are similar to Xcvrd

  • Ledd needs to subscriber to REDIS-DB in each namespace to receive updates of PORT UP/DOWN
  • Ledd needs to been modified to be namespace aware. The LED monitoring tasks are run per namespace

Callflow

Front Panel LED

Show command

show led status

admin@sonic:~$show led status

FRONT-PANEL INTERFACE STATUS TABLE
----------------------------------------------
| Interface     | Status                     |
----------------------------------------------
| Ethernet1     | state=fast-blink  amber    |
| Ethernet2     | state=fast-blink  amber    |
| Ethernet3     | state=on          green    |
| Ethernet4     | state=fast-blink  amber    |
| Ethernet5     | state=fast-blink  amber    |
| Ethernet6     | state=fast-blink  amber    |
| Ethernet7     | state=fast-blink  amber    |
| Ethernet8     | state=fast-blink  amber    |
| Ethernet9     | state=fast-blink  amber    |
| Ethernet10    | state=fast-blink  amber    |
| Ethernet11    | state=fast-blink  amber    |
| Ethernet12    | state=fast-blink  amber    |
| Ethernet13    | state=fast-blink  amber    |
| Ethernet14    | state=fast-blink  amber    |
| Ethernet15    | state=on          green    |
| Ethernet16    | state=fast-blink  amber    |
| Ethernet17    | state=fast-blink  amber    |
| Ethernet18    | state=fast-blink  amber    |
| Ethernet19    | state=fast-blink  amber    |
| Ethernet20    | state=fast-blink  amber    |
| Ethernet21    | state=fast-blink  amber    |
| Ethernet22    | state=fast-blink  amber    |
| Ethernet23    | state=fast-blink  amber    |
| Ethernet24    | state=fast-blink  amber    |
| Ethernet25    | state=fast-blink  amber    |
| Ethernet26    | state=fast-blink  amber    |
| Ethernet27    | state=fast-blink  amber    |
| Ethernet28    | state=fast-blink  amber    |
| Ethernet29    | state=fast-blink  amber    |
| Ethernet30    | state=fast-blink  amber    |
| Ethernet31    | state=fast-blink  amber    |
| Ethernet32    | state=fast-blink  amber    |
| Ethernet33    | state=fast-blink  amber    |
| Ethernet34    | state=fast-blink  amber    |
| Ethernet35    | state=fast-blink  amber    |
| Ethernet36    | state=fast-blink  amber    |
----------------------------------------------

3.3.5 Syseepromd

Syseepromd will run on control and line-cards indepenedently and monitor for any changes in syseeprom. The functionality is similar to fixed platform devices.

3.3.6 Midplane Ethernet

To manage and monitor midplace ethernet, the following vendor-specific PMON 2.0 APIs can be introduced:

  • API to initialize the midplane on both control and line cards - init_midplane_switch()
  • APIs to check midplane connectivity:
    • On line-card to check if control-card is reachable via midplane - is_midplane_controlcard_reachable()
    • On control-card to check if line-card on slot is reachable via midplane - is_midplane_linecard_reachable(slot)
  • APIs to get slot and IP-addresses of control and line cards.
In platform/broadcom/<vendor>/sonic_platform/chassis.py:

def init_midplane_switch():

def is_midplane_controlcard_reachable():

def is_midplane_linecard_reachable(slot):

def get_my_slot():

def get_controlcard_slot():

def get_controlcard_midplane_ip():

def get_linecard_midplane_ip(slot):

The proposal would be to use Chassisd to implement this functionality.

In src/sonic-platform-daemons/sonic-chassisd/scripts/chassid:

class midplane_monitor_task:
    def task_worker(self): 
        # Create midplane network
        if platform_chassis is not None:
            platform_chassis.init_midplane_switch()
        else:
            sys.exit(NOT_IMPLEMENTED)
            
    
        logger.log_info("Start midplane task loop")
    
        while not self.stop.wait(MIDPLANE_MONITOR_PERIOD_SECS):
            if platform_chassis.get_controlcard_slot() == platform_chassis.get_my_slot():
                for card in platform_chassis.get_all_linecards():
                    platform_chassis.is_midplane_linecard_reachable(card.get_slot())
            else:
                platform_chassis.is_midplane_controlcard_reachable()
                
        logger.log_info("Stop midplane task loop")