Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model #1986

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model
This is to backport community PR 1910 to 202106 branch.

**What I did**

Reclaim reserved buffer of unused ports for both dynamic and traditional models.
This is done by
- Removing lossless priority groups on unused ports.
- Applying zero buffer profiles on the buffer objects of unused ports.
- In the dynamic buffer model, the zero profiles are loaded from a JSON file and applied to `APPL_DB` if there are admin down ports.
  The default buffer configuration will be configured on all ports. Buffer manager will apply zero profiles on admin down ports.
- In the static buffer model, the zero profiles are loaded by the buffer template.

Signed-off-by: Stephen Sun <[email protected]>

**Why I did it**

**How I verified it**

Regression test and vs test.

**Details if related**
***Static buffer model***

Remove the lossless buffer priority group if the port is admin-down and the buffer profile aligns with the speed and cable length of the port.

***Dynamic buffer model***

****Handle zero buffer pools and profiles****

1. buffermgrd: add a CLI option to load the JSON file for zero profiles.
2. Load them from JSON file into the internal buffer manager's data structure
3. Apply them to APPL_DB once there is at least one admin-down port
   - Record zero profiles' names in the pool object it references.
     By doing so, the zero profile lists can be constructed according to the normal profile list. There should be one profile for each pool on the ingress/egress side.
   - And then apply the zero profiles to the buffer objects of the port.
   - Unload them from APPL_DB once all ports are admin-up since the zero pools and profiles are no longer referenced.
     Remove buffer pool counter id when the zero pool is removed.
4. Now that it's possible that a pool will be removed from the system, the watermark counter of the pool is removed ahead of the pool itself being removed.

****Handle port admin status change****

1. Currently, there is a logic of removing buffer priority groups of admin down ports. This logic will be reused and extended for all buffer objects, including `BUFFER_QUEUE`, `BUFFER_PORT_INGRESS_PROFILE_LIST`, and `BUFFER_PORT_EGRESS_PROFILE_LIST`.
   - When the port is admin down,
     - The normal profiles are removed from the buffer objects of the port
     - The zero profiles, if provided, are applied to the port
   - When the port is admin up,
     - The zero profiles, if applied, are removed from the port
     - The normal profiles are applied to the port.
2. Ports orchagent exposes the number of queues and priority groups to STATE_DB.
   Buffer manager can take advantage of these values to apply zero profiles on all the priority groups and queues of the admin-down ports.
   In case it is not necessary to apply zero profiles on all priority groups or queues on a certain platform, `ids_to_reclaim` can be customized in the JSON file.
3. Handle all buffer tables, including `BUFFER_PG`, `BUFFER_QUEUE`, `BUFFER_PORT_INGRESS_PROFILE_LIST` and `BUFFER_PORT_EGRESS_PROFILE_LIST`
   - Originally, only the `BUFFER_PG` table was cached in the dynamic buffer manager.
   - Now, all tables are cached in order to apply zero profiles when a port is admin down and apply normal profiles when it's up.
   - The index of such tables can include a single port or a list of ports, like `BUFFER_PG|Ethernet0|3-4` or `BUFFER_PG|Ethernet0,Ethernet4,Ethernet8|3-4`. Originally, there is a logic to handle such indexes for the `BUFFER_PG` table. Now it is reused and extended to handle all the tables.
4. [Mellanox] Plugin to calculate buffer pool size:
   - Originally, buffer for the queue, buffer profile list, etc. were not reclaimed for admin-down ports so they are reserved for all ports.
   - Now, they are reserved for admin-up ports only.

****Accelerate the progress of applying buffer tables to APPL_DB****

This is an optimization on top of reclaiming buffer.

1. Don't apply buffer profiles, buffer objects to `APPL_DB` before buffer pools are applied when the system is starting.
   This is to apply the items in an order from referenced items to referencing items and try to avoid buffer orchagent retrying due to referenced table items.
   However, it is still possible that the referencing items are handled before referenced items. In that case, there should not be any error message.
2. [Mellanox] Plugin to calculate buffer pool size:
   Return the buffer pool sizes value currently in APPL_DB if the pool sizes are not able to be calculated due to lacking some information. This typically happens at the system start.
   This is to accelerate the progress of pushing tables to APPL_DB.
stephenxs committed Dec 3, 2021
commit 2dfe25b8a4e9be7a4a132b0cfe3b8537c6017be2
130 changes: 122 additions & 8 deletions cfgmgr/buffer_pool_mellanox.lua
Original file line number Diff line number Diff line change
@@ -28,11 +28,23 @@ local port_set_8lanes = {}
local lossless_port_count = 0

local function iterate_all_items(all_items, check_lossless)
-- Iterates all items in all_items, check the buffer profile each item referencing, and update reference count accordingly
-- Arguments:
-- all_items is a list, holding all keys in BUFFER_PORT_INGRESS_PROFILE_LIST or BUFFER_PORT_EGRESS_PROFILE_LIST table
-- format of keys: <port name>|<ID map>, like Ethernet0|3-4
-- Return:
-- 0 successful
-- 1 failure, typically caused by the items just updated are still pended in orchagent's queue
table.sort(all_items)
local lossless_ports = {}
local port
local fvpairs
for i = 1, #all_items, 1 do
-- XXX_TABLE_KEY_SET or XXX_TABLE_DEL_SET existing means the orchagent hasn't handled all updates
-- In this case, the pool sizes are not calculated for now and will retry later
if string.sub(all_items[i], -4, -1) == "_SET" then
return 1
end
-- Count the number of priorities or queues in each BUFFER_PG or BUFFER_QUEUE item
-- For example, there are:
-- 3 queues in 'BUFFER_QUEUE_TABLE:Ethernet0:0-2'
@@ -73,6 +85,83 @@ local function iterate_all_items(all_items, check_lossless)
return 0
end

local function iterate_profile_list(all_items)
-- Iterates all items in all_items, check the buffer profiles each item referencing, and update reference count accordingly
-- Arguments:
-- all_items is a list, holding all keys in BUFFER_PORT_INGRESS_PROFILE_LIST or BUFFER_PORT_EGRESS_PROFILE_LIST table
-- format of keys: <port name>
-- Return:
-- 0 successful
-- 1 failure, typically caused by the items just updated are still pended in orchagent's queue
local port
for i = 1, #all_items, 1 do
-- XXX_TABLE_KEY_SET or XXX_TABLE_DEL_SET existing means the orchagent hasn't handled all updates
-- In this case, the pool sizes are not calculated for now and will retry later
if string.sub(all_items[i], -4, -1) == "_SET" then
return 1
end
port = string.match(all_items[i], "Ethernet%d+")
local profile_list = redis.call('HGET', all_items[i], 'profile_list')
if not profile_list then
return 0
end
for profile_name in string.gmatch(profile_list, "([^,]+)") do
-- The format of profile_list is profile_name,profile_name
-- We need to handle each of the profile in the list
-- The ingress_lossy_profile is shared by both BUFFER_PG|<port>|0 and BUFFER_PORT_INGRESS_PROFILE_LIST
-- It occupies buffers in BUFFER_PG but not in BUFFER_PORT_INGRESS_PROFILE_LIST
-- To distinguish both cases, a new name "ingress_lossy_profile_list" is introduced to indicate
-- the profile is used by the profile list where its size should be zero.
profile_name = string.sub(profile_name, 2, -2)
if profile_name == 'BUFFER_PROFILE_TABLE:ingress_lossy_profile' then
profile_name = profile_name .. '_list'
if profiles[profile_name] == nil then
profiles[profile_name] = 0
end
end
local profile_ref_count = profiles[profile_name]
if profile_ref_count == nil then
return 1
end
profiles[profile_name] = profile_ref_count + 1
end
end

return 0
end

local function fetch_buffer_pool_size_from_appldb()
local buffer_pools = {}
redis.call('SELECT', config_db)
local buffer_pool_keys = redis.call('KEYS', 'BUFFER_POOL|*')
local pool_name
for i = 1, #buffer_pool_keys, 1 do
local size = redis.call('HGET', buffer_pool_keys[i], 'size')
if not size then
pool_name = string.match(buffer_pool_keys[i], "BUFFER_POOL|([^%s]+)$")
table.insert(buffer_pools, pool_name)
end
end

redis.call('SELECT', appl_db)
buffer_pool_keys = redis.call('KEYS', 'BUFFER_POOL_TABLE:*')
local size
local xoff
local output
for i = 1, #buffer_pools, 1 do
size = redis.call('HGET', 'BUFFER_POOL_TABLE:' .. buffer_pools[i], 'size')
if not size then
size = "0"
end
xoff = redis.call('HGET', 'BUFFER_POOL_TABLE:' .. buffer_pools[i], 'xoff')
if not xoff then
table.insert(result, buffer_pools[i] .. ':' .. size)
else
table.insert(result, buffer_pools[i] .. ':' .. size .. ':' .. xoff)
end
end
end

-- Connect to CONFIG_DB
redis.call('SELECT', config_db)

@@ -82,7 +171,10 @@ total_port = #ports_table

-- Initialize the port_set_8lanes set
local lanes
local number_of_lanes
local number_of_lanes = 0
local admin_status
local admin_up_port = 0
local admin_up_8lanes_port = 0
local port
for i = 1, total_port, 1 do
-- Load lanes from PORT table
@@ -99,13 +191,26 @@ for i = 1, total_port, 1 do
port_set_8lanes[port] = false
end
end
admin_status = redis.call('HGET', ports_table[i], 'admin_status')
if admin_status == 'up' then
admin_up_port = admin_up_port + 1
if (number_of_lanes == 8) then
admin_up_8lanes_port = admin_up_8lanes_port + 1
end
end
number_of_lanes = 0
end

local egress_lossless_pool_size = redis.call('HGET', 'BUFFER_POOL|egress_lossless_pool', 'size')

-- Whether shared headroom pool is enabled?
local default_lossless_param_keys = redis.call('KEYS', 'DEFAULT_LOSSLESS_BUFFER_PARAMETER*')
local over_subscribe_ratio = tonumber(redis.call('HGET', default_lossless_param_keys[1], 'over_subscribe_ratio'))
local over_subscribe_ratio
if #default_lossless_param_keys > 0 then
over_subscribe_ratio = tonumber(redis.call('HGET', default_lossless_param_keys[1], 'over_subscribe_ratio'))
else
over_subscribe_ratio = 0
end

-- Fetch the shared headroom pool size
local shp_size = tonumber(redis.call('HGET', 'BUFFER_POOL|ingress_lossless_pool', 'xoff'))
@@ -161,7 +266,18 @@ local fail_count = 0
fail_count = fail_count + iterate_all_items(all_pgs, true)
fail_count = fail_count + iterate_all_items(all_tcs, false)
if fail_count > 0 then
return {}
fetch_buffer_pool_size_from_appldb()
return result
end

local all_ingress_profile_lists = redis.call('KEYS', 'BUFFER_PORT_INGRESS_PROFILE_LIST*')
local all_egress_profile_lists = redis.call('KEYS', 'BUFFER_PORT_EGRESS_PROFILE_LIST*')

fail_count = fail_count + iterate_profile_list(all_ingress_profile_lists)
fail_count = fail_count + iterate_profile_list(all_egress_profile_lists)
if fail_count > 0 then
fetch_buffer_pool_size_from_appldb()
return result
end

local statistics = {}
@@ -177,9 +293,6 @@ for name in pairs(profiles) do
if name == "BUFFER_PROFILE_TABLE:ingress_lossy_profile" then
size = size + lossypg_reserved
end
if name == "BUFFER_PROFILE_TABLE:egress_lossy_profile" then
profiles[name] = total_port
end
if size ~= 0 then
if shp_enabled and shp_size == 0 then
local xon = tonumber(redis.call('HGET', name, 'xon'))
@@ -211,11 +324,11 @@ if shp_enabled then
end

-- Accumulate sizes for management PGs
local accumulative_management_pg = (total_port - port_count_8lanes) * lossypg_reserved + port_count_8lanes * lossypg_reserved_8lanes
local accumulative_management_pg = (admin_up_port - admin_up_8lanes_port) * lossypg_reserved + admin_up_8lanes_port * lossypg_reserved_8lanes
accumulative_occupied_buffer = accumulative_occupied_buffer + accumulative_management_pg

-- Accumulate sizes for egress mirror and management pool
local accumulative_egress_mirror_overhead = total_port * egress_mirror_headroom
local accumulative_egress_mirror_overhead = admin_up_port * egress_mirror_headroom
accumulative_occupied_buffer = accumulative_occupied_buffer + accumulative_egress_mirror_overhead + mgmt_pool_size

-- Switch to CONFIG_DB
@@ -295,5 +408,6 @@ table.insert(result, "debug:egress_mirror:" .. accumulative_egress_mirror_overhe
table.insert(result, "debug:shp_enabled:" .. tostring(shp_enabled))
table.insert(result, "debug:shp_size:" .. shp_size)
table.insert(result, "debug:total port:" .. total_port .. " ports with 8 lanes:" .. port_count_8lanes)
table.insert(result, "debug:admin up port:" .. admin_up_port .. " admin up ports with 8 lanes:" .. admin_up_8lanes_port)

return result
4 changes: 4 additions & 0 deletions cfgmgr/buffermgrd.cpp
Original file line number Diff line number Diff line change
@@ -12,6 +12,7 @@
#include <iostream>
#include "json.h"
#include "json.hpp"
#include "warm_restart.h"

using namespace std;
using namespace swss;
@@ -170,6 +171,9 @@ int main(int argc, char **argv)

if (dynamicMode)
{
WarmStart::initialize("buffermgrd", "swss");
WarmStart::checkWarmStart("buffermgrd", "swss");

vector<TableConnector> buffer_table_connectors = {
TableConnector(&cfgDb, CFG_PORT_TABLE_NAME),
TableConnector(&cfgDb, CFG_PORT_CABLE_LEN_TABLE_NAME),
Loading