Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A79: Non-per-call Metrics Architecture #421

Merged
merged 24 commits into from
Apr 8, 2024

Conversation

yashykt
Copy link
Member

@yashykt yashykt commented Feb 23, 2024

No description provided.

@yashykt yashykt marked this pull request as ready for review February 23, 2024 12:49
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Comments are mostly cosmetic things.

Please let me know if you have any questions. Thanks!

A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
Copy link
Member

@yijiem yijiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yash!

A79-non-per-call-metrics-architecture.md Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one substantive comment remaining. The rest are cosmetic.

Please let me know if you have any questions. Thanks!

A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
Comment on lines +316 to +321
* Metrics from load-balancing policies should be nested under `grpc.lb.`, e.g.
`grpc.lb.pick_first.`.
* Metrics from resolvers should be nested under `grpc.resolver.`, e.g.
`grpc.resolver.dns.`.
* Metrics from transports should be nested under `grpc.transport.`, e.g.,
`grpc.transport.http2.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly by this you mean "Metrics from first-party gRPC ___s". Any guidance for third-party plugins for these components?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only recommendation from gRPC for third-party plugins is to NOT use the grpc. namespace. In general, the same recommendations from OpenTelemetry (https://opentelemetry.io/docs/specs/semconv/general/metrics/#general-guidelines) should be followed

A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Java APIs for gRPC Non Per Call Metric Architecture
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I have, before I get on a flight.

}

interface MetricDescriptor {
long index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DNVindhya, for the actual implementation, uf this has data, it can't be an interface. But conceptually this is fine. For the actual implementation we'd probably also move name, description, and other common aspects of the descriptors here. But that's all invisible to users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Overlooked the part of data member being final and static.
How about I add common methods for descriptors like getIndex(), getName() in an interface and common data members like name, description and unit in an abstract class. Then LongCounterDescriptor can implement the interface and extend the abstract class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an interface with the methods is fine. We can still have an abstract class, but it can be an implementation detail (not public).

A79-non-per-call-metrics-architecture.md Outdated Show resolved Hide resolved
public List<Object> getMetricsMeasures();

/** Records a value for a long counter measure. */
default void recordLongCounter(MetricDescriptor counterDescriptor, Long value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DNVindhya, should the first argument be LongCounterDescriptor? Similar for the next method.

Should value be a long, not the boxed Long?

New metrics should start as experimental. Once they are stable, they can
optionally be made on-by-default. The distinction between on-by-default and
off-by-default metrics allows gRPC to support metrics that may be desirable in
some cases but too expensive to collect by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume "too expensive to collect" is about CPU cost. But there is a storage cost argument as well, after it is . Are we only concerned with CPU cost here? (There could be memory costs, but in general I assume we won't have metrics that are too expensive in memory.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we are mostly worried about collection costs here which I assume would be mostly CPU, maybe memory as well, if, for example, the metric now requires to store some information for each RPC.

* Enables metrics specified in the set along with metrics that are enabled
* by default.
*/
public Builder enableMetrics(Set<String> enableMetrics);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DNVindhya, users won't have a set for this already. We'll probably want to use Collection or Iterable instead.

/** Disable metrics specified in the set. */
public Builder disableMetrics(Set<String> disableMetrics);

/** Disable all metrics that are enabled by default. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DNVindhya, Does this only disable metrics disabled by default, or if someone had called enabledMetrics, would it remove those as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked C++, this will disable all metrics even the ones added by enableMetrics. I will update the comment.

Comment on lines +268 to +272
In the second approach, stats plugins that are to be scoped to a channel are
registered with the corresponding channel directly, instead of the global stats
plugin registry. On channel creation time, stats plugins from the registry are
combined with the list of stats plugins registered directly on the channel to
form the complete list of stats plugins for this channel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have plans to enable the second approach for Core? If not can we mention that Core only support first approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that we will support it in the future.

@yashykt yashykt merged commit ea55525 into grpc:master Apr 8, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants