Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce OpenTelemetry Metrics Recording #2500

Merged
merged 28 commits into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
21407aa
feat: Implement OpentelemetryMetricsRecorder
lqiu96 Feb 26, 2024
f757811
chore: Update CONTRIBUTING.md
lqiu96 Feb 26, 2024
58fc09c
chore: Fix lint issues
lqiu96 Feb 26, 2024
d29ad1a
chore: Clean up ITOTelMetrics test
lqiu96 Feb 26, 2024
df7c611
chore: Clean up ITOTelMetrics test
lqiu96 Feb 26, 2024
9a3fc3c
chore: Ignore failing Otel tests
lqiu96 Feb 27, 2024
9121cb6
chore: Update OpentelemetryMetricsRecorderTest to use InMemoryMetricE…
lqiu96 Feb 27, 2024
9d099f6
chore: Update showcase tests
lqiu96 Feb 27, 2024
03acef3
chore: Add missing otel dependencies.properties values
lqiu96 Feb 27, 2024
a2d979b
chore: Add javadocs for OpentelemetryMetricsRecorder
lqiu96 Feb 27, 2024
d46fef7
Merge branch 'main' into implement-OpentelemetryMetricsRecorder
lqiu96 Feb 27, 2024
4edded1
chore: Use otel v1.34.1
lqiu96 Feb 27, 2024
78cc64a
chore: Ignore the HttpJson Otel tests
lqiu96 Feb 28, 2024
b620ffc
chore: Fix showcase tests
lqiu96 Mar 4, 2024
06a6bba
Merge branch 'main' into implement-OpentelemetryMetricsRecorder
lqiu96 Mar 4, 2024
07eeb3b
chore: Refactor showcase tests
lqiu96 Mar 4, 2024
6df9640
chore: Address PR comments
lqiu96 Mar 5, 2024
edd7cea
chore: Throw exception if operation completion call has been invoked …
lqiu96 Mar 5, 2024
32ee2fe
chore: Address PR comments
lqiu96 Mar 5, 2024
825175e
chore: Add otel bom to gapic-generator-java bom
lqiu96 Mar 5, 2024
09ea967
chore: Address PR comments
lqiu96 Mar 5, 2024
ea5a776
chore: Fix showcase tests
lqiu96 Mar 5, 2024
998eb68
Update gax-java/dependencies.properties
lqiu96 Mar 6, 2024
1453461
Merge branch 'main' into implement-OpentelemetryMetricsRecorder
lqiu96 Mar 6, 2024
6c7bca3
chore: Use gax-java as instrument scope name
lqiu96 Mar 6, 2024
fad8094
Update gax-java/gax/src/main/java/com/google/api/gax/tracing/OpenTele…
blakeli0 Mar 14, 2024
41bbf57
Merge branch 'main' into implement-OpentelemetryMetricsRecorder
blakeli0 Mar 14, 2024
a79e4e7
Update gax-java/gax/src/main/java/com/google/api/gax/tracing/OpenTele…
blakeli0 Mar 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions gax-java/dependencies.properties
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ maven.com_google_api_grpc_proto_google_common_protos=com.google.api.grpc:proto-g
maven.com_google_api_grpc_grpc_google_common_protos=com.google.api.grpc:grpc-google-common-protos:2.33.0
maven.com_google_auth_google_auth_library_oauth2_http=com.google.auth:google-auth-library-oauth2-http:1.23.0
maven.com_google_auth_google_auth_library_credentials=com.google.auth:google-auth-library-credentials:1.23.0
maven.io_opentelemetry_opentelemetry_api=io.opentelemetry:opentelemetry-api:1.34.1
lqiu96 marked this conversation as resolved.
Show resolved Hide resolved
maven.io_opentelemetry_opentelemetry_sdk=io.opentelemetry:opentelemetry-sdk:1.34.1
maven.io_opentelemetry_opentelemetry_sdk_common=io.opentelemetry:opentelemetry-sdk-common:1.34.1
maven.io_opentelemetry_opentelemetry-sdk-metrics=io.opentelemetry:opentelemetry-sdk-metrics:1.34.1
maven.io_opentelemetry_opentelemetry-sdk-testing=io.opentelemetry:opentelemetry-sdk-testing:1.34.1
maven.io_opencensus_opencensus_api=io.opencensus:opencensus-api:0.31.1
maven.io_opencensus_opencensus_contrib_grpc_metrics=io.opencensus:opencensus-contrib-grpc-metrics:0.31.1
maven.io_opencensus_opencensus_contrib_http_util=io.opencensus:opencensus-contrib-http-util:0.31.1
Expand Down
5 changes: 5 additions & 0 deletions gax-java/gax/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ _COMPILE_DEPS = [
"@com_google_code_findbugs_jsr305//jar",
"@com_google_errorprone_error_prone_annotations//jar",
"@com_google_guava_guava//jar",
"@io_opentelemetry_opentelemetry_api//jar",
lqiu96 marked this conversation as resolved.
Show resolved Hide resolved
"@io_opentelemetry_opentelemetry-sdk-metrics//jar",
"@io_opencensus_opencensus_api//jar",
"@io_opencensus_opencensus_contrib_http_util//jar",
"@io_grpc_grpc_java//context:context",
Expand All @@ -38,6 +40,9 @@ _TEST_COMPILE_DEPS = [
"@net_bytebuddy_byte_buddy//jar",
"@org_objenesis_objenesis//jar",
"@com_googlecode_java_diff_utils_diffutils//jar",
"@io_opentelemetry_opentelemetry_sdk//jar",
"@io_opentelemetry_opentelemetry-sdk-testing//jar",
"@io_opentelemetry_opentelemetry_sdk_common//jar"
]

java_library(
Expand Down
9 changes: 9 additions & 0 deletions gax-java/gax/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,15 @@
<artifactId>graal-sdk</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk-testing</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

<build>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@

import com.google.api.core.ApiClock;
import com.google.api.core.ApiFunction;
import com.google.api.core.BetaApi;
import com.google.api.gax.core.CredentialsProvider;
import com.google.api.gax.core.ExecutorProvider;
import com.google.api.gax.tracing.ApiTracerFactory;
import com.google.common.base.MoreObjects;
import java.io.IOException;
import java.util.concurrent.Executor;
Expand Down Expand Up @@ -120,6 +122,16 @@ public final String getGdchApiAudience() {
return stubSettings.getGdchApiAudience();
}

/**
* Gets the configured {@link ApiTracerFactory} that will be used to generate traces for
* operations.
*/
@BetaApi("The surface for tracing is not stable yet and may change in the future.")
@Nonnull
public ApiTracerFactory getTracerFactory() {
return stubSettings.getTracerFactory();
}
lqiu96 marked this conversation as resolved.
Show resolved Hide resolved

public String toString() {
return MoreObjects.toStringHelper(this)
.add("executorProvider", getExecutorProvider())
Expand Down Expand Up @@ -284,6 +296,16 @@ public B setGdchApiAudience(@Nullable String gdchApiAudience) {
return self();
}

/**
* Sets the ApiTracerFactory for the client instance. To enable default metrics, users need to
* create an instance of metricsRecorder and pass it to the metricsTracerFactory, and set it
* here.
*/
public B setTracerFactory(@Nullable ApiTracerFactory tracerFactory) {
stubSettings.setTracerFactory(tracerFactory);
return self();
}
lqiu96 marked this conversation as resolved.
Show resolved Hide resolved

/**
* Gets the ExecutorProvider that was previously set on this Builder. This ExecutorProvider is
* to use for running asynchronous API call logic (such as retries and long-running operations),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,40 +40,50 @@
import java.util.Map;
import java.util.concurrent.CancellationException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;
import javax.annotation.Nullable;
import org.threeten.bp.Duration;

/**
* This class computes generic metrics that can be observed in the lifecycle of an RPC operation.
* The responsibility of recording metrics should delegate to {@link MetricsRecorder}, hence this
* class should not have any knowledge about the observability framework used for metrics recording.
* method_name and language will be autopopulated attributes. Default value of language is 'Java'.
*/
@BetaApi
@InternalApi
public class MetricsTracer implements ApiTracer {

private static final String STATUS_ATTRIBUTE = "status";

public static final String METHOD_NAME_ATTRIBUTE = "method_name";
public static final String LANGUAGE_ATTRIBUTE = "language";
public static final String STATUS_ATTRIBUTE = "status";
public static final String DEFAULT_LANGUAGE = "Java";
private static final String OPERATION_FINISHED_STATUS_MESSAGE =
"Operation has already been completed";
private Stopwatch attemptTimer;

private final Stopwatch operationTimer = Stopwatch.createStarted();

private final Map<String, String> attributes = new HashMap<>();

private MetricsRecorder metricsRecorder;
private final MetricsRecorder metricsRecorder;
private final AtomicBoolean operationFinished;

public MetricsTracer(MethodName methodName, MetricsRecorder metricsRecorder) {
this.attributes.put("method_name", methodName.toString());
this.attributes.put(METHOD_NAME_ATTRIBUTE, methodName.toString());
this.attributes.put(LANGUAGE_ATTRIBUTE, DEFAULT_LANGUAGE);
this.metricsRecorder = metricsRecorder;
this.operationFinished = new AtomicBoolean();
}

/**
* Signals that the overall operation has finished successfully. The tracer is now considered
* closed and should no longer be used. Successful operation adds "OK" value to the status
* attribute key.
*
* @throws IllegalStateException if an operation completion call has already been invoked
*/
@Override
public void operationSucceeded() {
if (operationFinished.getAndSet(true)) {
throw new IllegalStateException(OPERATION_FINISHED_STATUS_MESSAGE);
}
attributes.put(STATUS_ATTRIBUTE, StatusCode.Code.OK.toString());
metricsRecorder.recordOperationLatency(
operationTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
Expand All @@ -84,9 +94,14 @@ public void operationSucceeded() {
* Signals that the operation was cancelled by the user. The tracer is now considered closed and
* should no longer be used. Cancelled operation adds "CANCELLED" value to the status attribute
* key.
*
* @throws IllegalStateException if an operation completion call has already been invoked
*/
@Override
public void operationCancelled() {
if (operationFinished.getAndSet(true)) {
throw new IllegalStateException(OPERATION_FINISHED_STATUS_MESSAGE);
}
attributes.put(STATUS_ATTRIBUTE, StatusCode.Code.CANCELLED.toString());
metricsRecorder.recordOperationLatency(
operationTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
Expand All @@ -97,9 +112,14 @@ public void operationCancelled() {
* Signals that the operation was cancelled by the user. The tracer is now considered closed and
* should no longer be used. Failed operation extracts the error from the throwable and adds it to
* the status attribute key.
*
* @throws IllegalStateException if an operation completion call has already been invoked
*/
@Override
public void operationFailed(Throwable error) {
if (operationFinished.getAndSet(true)) {
throw new IllegalStateException(OPERATION_FINISHED_STATUS_MESSAGE);
}
attributes.put(STATUS_ATTRIBUTE, extractStatus(error));
metricsRecorder.recordOperationLatency(
operationTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
Expand All @@ -126,7 +146,6 @@ public void attemptStarted(Object request, int attemptNumber) {
*/
@Override
public void attemptSucceeded() {

attributes.put(STATUS_ATTRIBUTE, StatusCode.Code.OK.toString());
metricsRecorder.recordAttemptLatency(attemptTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
metricsRecorder.recordAttemptCount(1, attributes);
Expand All @@ -138,7 +157,6 @@ public void attemptSucceeded() {
*/
@Override
public void attemptCancelled() {

attributes.put(STATUS_ATTRIBUTE, StatusCode.Code.CANCELLED.toString());
metricsRecorder.recordAttemptLatency(attemptTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
metricsRecorder.recordAttemptCount(1, attributes);
Expand All @@ -154,7 +172,6 @@ public void attemptCancelled() {
*/
@Override
public void attemptFailed(Throwable error, Duration delay) {

attributes.put(STATUS_ATTRIBUTE, extractStatus(error));
metricsRecorder.recordAttemptLatency(attemptTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
metricsRecorder.recordAttemptCount(1, attributes);
Expand All @@ -169,7 +186,6 @@ public void attemptFailed(Throwable error, Duration delay) {
*/
@Override
public void attemptFailedRetriesExhausted(Throwable error) {

attributes.put(STATUS_ATTRIBUTE, extractStatus(error));
metricsRecorder.recordAttemptLatency(attemptTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
metricsRecorder.recordAttemptCount(1, attributes);
Expand All @@ -184,7 +200,6 @@ public void attemptFailedRetriesExhausted(Throwable error) {
*/
@Override
public void attemptPermanentFailure(Throwable error) {

attributes.put(STATUS_ATTRIBUTE, extractStatus(error));
metricsRecorder.recordAttemptLatency(attemptTimer.elapsed(TimeUnit.MILLISECONDS), attributes);
metricsRecorder.recordAttemptCount(1, attributes);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
/*
* Copyright 2024 Google LLC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Google LLC nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

package com.google.api.gax.tracing;

import com.google.api.gax.core.GaxProperties;
import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.common.AttributesBuilder;
import io.opentelemetry.api.metrics.DoubleHistogram;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;
import java.util.Map;

/**
* OpenTelemetry implementation of recording metrics. This implementation collections the
* measurements related to the lifecyle of an RPC.
*
* <p>For the Otel implementation, an attempt is a single RPC invocation and an operation is the
* collection of all the attempts made before a response is returned (either as a success or an
* error). A single call (i.e. `EchoClient.echo()`) should have an operation_count of 1 and may have
* an attempt_count of 1+ (depending on the retry configurations).
*/
public class OpentelemetryMetricsRecorder implements MetricsRecorder {
private final DoubleHistogram attemptLatencyRecorder;
private final DoubleHistogram operationLatencyRecorder;
private final LongCounter operationCountRecorder;
private final LongCounter attemptCountRecorder;

/**
* Creates the following instruments for the following metrics:
*
* <ul>
* <li>Attempt Latency: Histogram
* <li>Operation Latency: Histogram
* <li>Attempt Count: Counter
* <li>Operation Count: Counter
* </ul>
*
* @param openTelemetry OpenTelemetry instance
* @param serviceName Service Name
*/
public OpentelemetryMetricsRecorder(OpenTelemetry openTelemetry, String serviceName) {
Meter meter =
openTelemetry
.meterBuilder("Gax-OtelMetrics")
.setInstrumentationVersion(GaxProperties.getGaxVersion())
.build();
this.attemptLatencyRecorder =
meter
.histogramBuilder(serviceName + "/attempt_latency")
.setDescription("Time an individual attempt took")
.setUnit("ms")
.build();
this.operationLatencyRecorder =
meter
.histogramBuilder(serviceName + "/operation_latency")
.setDescription(
"Total time until final operation success or failure, including retries and backoff.")
.setUnit("ms")
.build();
this.attemptCountRecorder =
meter
.counterBuilder(serviceName + "/attempt_count")
.setDescription("Number of Attempts")
.setUnit("1")
.build();
this.operationCountRecorder =
meter
.counterBuilder(serviceName + "/operation_count")
.setDescription("Number of Operations")
.setUnit("1")
.build();
}

/**
* Record the latency for an individual attempt. Data is stored in a Histogram.
*
* @param attemptLatency Attempt Latency in ms
* @param attributes Map of the attributes to store
*/
@Override
public void recordAttemptLatency(double attemptLatency, Map<String, String> attributes) {
attemptLatencyRecorder.record(attemptLatency, toOtelAttributes(attributes));
}

/**
* Record an attempt made. The attempt count number is stored in a LongCounter.
*
* <p>The count should be set as 1 every time this is invoked (each retry attempt)
*
* @param count The number of attempts made
* @param attributes Map of the attributes to store
*/
@Override
public void recordAttemptCount(long count, Map<String, String> attributes) {
attemptCountRecorder.add(count, toOtelAttributes(attributes));
}

/**
* Record the latency for the entire operation. This is the latency for the entire RPC, including
* all the retry attempts
*
* @param operationLatency Operation Latency in ms
* @param attributes Map of the attributes to store
*/
@Override
public void recordOperationLatency(double operationLatency, Map<String, String> attributes) {
operationLatencyRecorder.record(operationLatency, toOtelAttributes(attributes));
}

/**
* Record an operation made. The operation count number is stored in a LongCounter.
*
* <p>The operation count should always be 1 and this should be invoked once.
*
* @param count The number of operations made
* @param attributes Map of the attributes to store
*/
@Override
public void recordOperationCount(long count, Map<String, String> attributes) {
operationCountRecorder.add(count, toOtelAttributes(attributes));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Operation record calls should only occur once. Do we want to validate this/ throw an exception if recordOperation(...) is invoked more than once?

Copy link
Collaborator

@blakeli0 blakeli0 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rely on the existing framework. If there is a existing bug, it should not be the responsibility of the recorder to validate it as well, we should probably validate it in the MetricsTracer.


@VisibleForTesting
Attributes toOtelAttributes(Map<String, String> attributes) {
Preconditions.checkNotNull(attributes, "Attributes map cannot be null");
AttributesBuilder attributesBuilder = Attributes.builder();
attributes.forEach(attributesBuilder::put);
return attributesBuilder.build();
}
}
Loading
Loading