Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the telemetry-atomic extension to BGP policy statements #867

Closed
wants to merge 2 commits into from

Conversation

wenovus
Copy link
Contributor

@wenovus wenovus commented May 2, 2023

Change Scope

  Add the telemetry-atomic extension to the BGP policy statements
  container to reflect that policies are configured as a whole rather than
  as individual statements. This change also enables scalar-based telemetry
  to support the ordering of this 'ordered-by user' list.

This change is backwards-incompatible for configuration and telemetry below the telemetry-atomic node, and for any telemetry on BGP policy statements on the server side.

This change fixes an ordering issue when configuring or streaming telemetry using scalar TypedValue types in gNMI. The issue is that it is currently not possible to indicate policy statement ordering when data is not streamed together as part of a single SubscribeResponse.

When clients and implementors provide scalar values in gNMI for this container, it is expected that when examining the streamed paths in order, that the ordering of every new path key reflects the order in the configuration.

wenovus added 2 commits May 2, 2023 12:32
      Add the telemetry-atomic extension to the BGP policy statements
      container to reflect that policies are configured as a whole rather than
      as individual statements. This change also enables scalar-based telemetry
      to support the ordering of this 'ordered-by user' list.
@wenovus wenovus requested a review from a team as a code owner May 2, 2023 19:40
@OpenConfigBot
Copy link

Major YANG version changes in commit d32f827:

@OpenConfigBot
Copy link

Compatibility Report for commit d32f827:
yanglint@SO 1.10.17

@rgwilton
Copy link

rgwilton commented May 3, 2023

Hi,

  1. Is this a generic issue for all "ordered-by user" lists in gNMI, or otherwise I'm struggling to understand how this particular list is different?
  2. I think that using a "telemetry-atomic" extension to modify the behavior of configuration (which is not telemetry) is going to be very confusing.

Regards,
Rob

@wenovus
Copy link
Contributor Author

wenovus commented May 3, 2023

Hi,

  1. Is this a generic issue for all "ordered-by user" lists in gNMI, or otherwise I'm struggling to understand how this particular list is different?
  2. I think that using a "telemetry-atomic" extension to modify the behavior of configuration (which is not telemetry) is going to be very confusing.

Regards, Rob

  1. Yes this is a generic issue for all ordered-by user lists in gNMI. There is one other instance of this is DNS server configuration (/system/dns/servers/server), I can add that if we agree this PR is the right approach.
  2. We can add a new extension config-atomic.

@dplore dplore self-assigned this May 3, 2023
@wenovus
Copy link
Contributor Author

wenovus commented May 5, 2023

Currently thinking about placing atomic at the list element instead of the surrounding container and then adding the metadata state leaves: /version-number, /sequence-number, /total-number or /end-of-sequence-marker. This way we limit the size of each update while giving clients a way to order correctly and to know whether the information is complete.

@rgwilton @jhaas-pfrc wondering if you have any initial thoughts on this, I'm just brainstorming here.

@jsterne
Copy link

jsterne commented May 5, 2023

I'm pretty doubtful about an "atomic" extension here for config. That implies some expected server behavior and I'm not sure we want to introduce that type of concept. Maybe we should describe this in the "description" statement for the list though (i.e. that it is like an atomic 'program' and can't be updated piecemeal, i.e. any sort of "eventual consistency" approach isn't applicable to the entire list).

I'd advocate for addressing this at the gNMI protocol level. i.e. adding "insert after" type fields to gNMI (in Set, Get and Notifications).

@wenovus
Copy link
Contributor Author

wenovus commented May 5, 2023

I'm pretty doubtful about an "atomic" extension here for config. That implies some expected server behavior and I'm not sure we want to introduce that type of concept. Maybe we should describe this in the "description" statement for the list though (i.e. that it is like an atomic 'program' and can't be updated piecemeal, i.e. any sort of "eventual consistency" approach isn't applicable to the entire list).

I'd advocate for addressing this at the gNMI protocol level. i.e. adding "insert after" type fields to gNMI (in Set, Get and Notifications).

I think of this concept as being separable into a config requirement and a telemetry requirement. Focusing on the telemetry requirement, would you say there is a difference between config and state data when you're querying for telemetry? Would this concept impose more difficulties compared to the telemetry-atomic already used for AFTs (state data) where the extension is also applied at the list element level? #860

Addressing this at the gNMI protocol level might be a more disruptive change. Even assuming it is not, then we must design a solution that allows for any client to answer the questions,

  1. What is the ordering when you're only receiving leaf updates?
  2. When do you know you have a complete snapshot?

In order to know 2 without an atomic flag, we need to add enough information to gNMI to tell the user that all leaf updates have been received. I believe the current preferred solution is to use atomic, as it doesn't run into a scale issue when it's only at the list element level, and furthermore there are no changes required for gNMI.

@wenovus
Copy link
Contributor Author

wenovus commented May 11, 2023

So one clarification here is that statements/statement correspond to a single BGP policy whereas policy-definitions/policy-definition refers to all policies, which import/export policies then refer to by using a leaf-list to these policy-definitions. Would that resolve the scale concern?

leaf-list import-policy {
type leafref {
path "/oc-rpol:routing-policy/oc-rpol:policy-definitions/" +
"oc-rpol:policy-definition/oc-rpol:name";
//TODO: require-instance should be added when it's
//supported in YANG 1.1
//require-instance true;
}
ordered-by user;
description
"list of policy names in sequence to be applied on
receiving a routing update in the current context, e.g.,
for the current peer group, neighbor, address family,
etc.";
}

Note that ordered-by user leaf-lists don't have an ordering problem because gNMI TypedValues support them in the form of ScalarArrays, such that they're effectively atomic.

@wenovus
Copy link
Contributor Author

wenovus commented May 15, 2023

@jhaas-pfrc @rgwilton @jsterne @hellt @rolandphung @earies

I've summarized the problem for easier consumption, but essentially if you understand the PR then just need to read the Request for Comments and "Important Note" sections below:

Request for Comments

What is the estimated worst-case size of a single BGP policy in terms of the serialized gNMI SubscribeResponse for the /routing-policy/policy-definitions/policy-definition/statements container in scalar TypedValue format. Specifically, will it exceed ~1MB and as a result might run into the default scale issue (currently 4MB by default)? If so, what is that size and how close does it get to the 2GB protocol buffer limit?

Introduction

Current OpenConfig models contain two instances of ordered-by user lists (or “ordered lists” for short):

  • BGP policy configuration: /routing-policy/policy-definitions/policy-definition/statements/statement
    • NOTE: Each statement list refers to a single BGP policy as opposed to the entire set of BGP policies defined on a device.
  • DNS server configuration: /system/dns/servers/server

Note that

  • YANG requires ordered lists to be config true.
  • config true lists must have keys, and hence these cannot be modelled as unkeyed lists.

This means that when this data is streamed as configuration or telemetry information, that the ordering of list elements must be maintained in order for the view to be meaningful. This is in order to satisfy two properties:

  1. Correct Order.
  2. Completeness — meaning that a client knows when its snapshot is complete rather than partial.

While it is mostly straightforward for tools such as ygot to support configuring sequential list elements and receiving device updates that respect update ordering, leaf-based gNMI telemetry systems (e.g. gnmi cache which depends on scalar TypedValue) may not support the ordering of map elements (lists are represented as Go maps and thus iteration order is random, and gnmi cache’s coalescing queue may furthermore reorder updates). As a result, it is currently impossible for clients to such a network management system to see meaningful snapshots of, for example, a single BGP policy configuration.

This document contains a request for comments on using telemetry-atomic, or a similar extension to model a single BGP policy’s set of ordered statements is a way of solving this issue (modelling is described in detail in the sections below). In this solution, the entire set of leaves under the /routing-policy/policy-definitions/policy-definition/statements container subtree will be serialized into a single Notification message using a common prefix with the atomic field marked true.

Important Note

The whole set of policies as defined by /routing-policy/policy-definitions is an unordered list (they are referenced by import/export/call policies via leaf-lists of leafrefs) and is therefore not subject to this problem.

Document Goal

Without using a behavior similar to atomic, then metadata must be added to indicate ordering or completeness of the ordered list data. Since it is preferable from a management tooling perspective to avoid having to process metadata, and changes to gnmi.proto itself can be very disruptive, it is this document’s goal to understand the practical implications of this solution before seriously considering an alternative approach.

Proposed Model

@@ -1022,8 +1031,17 @@ module openconfig-routing-policy {
       "Top-level grouping for the policy statements list";
 
     container statements {
+      oc-ext:telemetry-atomic;
       description
-        "Enclosing container for policy statements";
+        "Enclosing container for policy statements.
+
+        Note: in order to support scalar-based telemetry, policy statements are
+        treated as a whole instead of individually. Per the telemetry-atomic
+        extension, this means both configuration and telemetry must be sent in
+        whole and never as single policy statements. When scalar values are
+        provided, it is expected that when examining the streamed paths in
+        order, that the ordering of every new path key reflects the order in
+        the configuration.";
 
       list statement {
         key "name";

Pros

  • No changes to YANG model or gnmi.proto required.
  • No metadata required.
  • Tooling changes for supporting this are relatively straightforward.

Cons

  • Will not scale if there exists a BGP policy whose list of statements is very large. Note that by default a 4MB limit has been used for a single SubscribeResponse message. Protocol buffers has a hard limit of 2GB per message.

@jhaas-pfrc
Copy link

A few items were mentioned at the Thursday call for May:

  • Policies may be very long. In particular, for policy statements, the total size of statements may be tens of thousands of lines for extremely large scale.
  • Referenced policy elements themselves may be gigantic. Prefix-lists from IRR may be hundreds of megabytes in pathological cases.
  • The integrity of policy is important for consumers, if consumers actually care about this stuff.

To give an easy example, you can update a single policy statement to change a referenced prefix-set in one gnmi operation.

If the prefix-set itself includes "add 2M of prefixes", having an atomic operation on the policy statements itself isn't helpful. What you want is that the totality of the policy from the perspective of the receiver to be whole. This means all indirectly referenced elements.

From a logical standpoint, this would imply something like an "atomic" around all of policy - if we could guarantee that everything that's referenced is present. However, that's unlikely to happen.

Fundamentally, the issue becomes that streaming telemetry in the current form of gNMI is probably the Wrong Tool for monitoring policy, or really anything requiring high levels of consistency to achieve a consistent view of a set of related objects. Small things like individual routes can achieve this through dependency graphs and deeper control of on the wire ordering of operations, but gNMI doesn't provide strong guidance about this as best I understand things.

In IETF, there's the concept of "yang patch". Effectively, a flavor of this is what is desired here:

  • The ability to indicate that a set of related changes are about to be sent
  • The ability to send that set of changes
  • And then indicate that the relevant data for consistency has been conveyed so a receiver can use them.

For very large change sets, streamed changes for the diff may be larger than simply retrieving the entire set of covered data from scratch.

Personally, I would suggest the following things:

  1. For things like policy, you either get change sets via a diff mechanism or an indication that the policy has changed where the client can simply do a get of the entire state at the same time without using on-change.
  2. Evaluate WHY you're asking for this stuff in a on-change format in the first place. "Because you can" is a terrible answer.
    2a. If you're using it for change control notification... sure, you could do that, but there are better mechanisms.
    2b. If you're trying to use it to remotely model policy, do what's efficient and respects integrity and wholeness of the data. As it is, if you're receiving a route stream and you have a stream of on-change updates to your policy... YOU'RE WRONG. Policy is run with integrity on the routes and you're too far behind as a consumer to be able to fully model it.

The better answer here in the absence of a diff/patch mechanism is "don't do this".

@wenovus
Copy link
Contributor Author

wenovus commented Feb 7, 2024

Just want to update this PR since it's been pending to save time for future readers:

  • Practically, this will only impact telemetry for /routing-policy/policy-definitions/policy-definition/statements/statement/... paths that use the OpenConfig gNMI reference implementation, which does not stream paths in order, either for ONCE or STREAM modes (the latter is at least due to the coalescing algorithm).
  • Configuration is not impacted since tooling (e.g. ygot, ygnmi) already supports ordered-by user.

@github-actions github-actions bot added the Stale label Aug 21, 2024
@github-actions github-actions bot closed this Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants