-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats/opencensus: Add OpenCensus metrics support #5923
Conversation
9210455
to
bc6f8d4
Compare
codes/codes.go
Outdated
@@ -195,6 +195,27 @@ const ( | |||
_maxCode = 17 | |||
) | |||
|
|||
// CodeToStr maps from the code type to strings with only uppercase letters. | |||
var CodeToStr = map[Code]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not export this as a map, and make it a permanent part of our API.
Options? One easy one is putting it in internal
instead. (Sadly we decided to output a Go-looking name from String()
, and changing it to report the canonical name instead would probably break too many tests.)
Maybe a CanonicalString
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Plumbed through internal with a CanonicalString method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Whoops forgot to delete sorry.
stats/opencensus/client_metrics.go
Outdated
keyClientStatus = tag.MustNewKey("grpc_client_status") | ||
) | ||
|
||
// Measures, which are recorded by client stats handler (and interceptor?): Note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and interceptor?)
Let's not be unsure about what our code does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has question mark at the end of it because I don't know if it's worth noting if it hasn't happened yet (see comment about Yashes per call latency stuff elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no current measures being recorded at the interceptor level, I simply deleted this comment.
stats/opencensus/client_metrics.go
Outdated
} | ||
// ClientServerLatencyView is the distribution of server latency in | ||
// milliseconds per RPC, keyed on method. | ||
ClientServerLatencyView = &view.View{ // TODO: Same recording point as round trip latency, perhaps I should just delete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the legacy metric?
Is this the per-attempt one and the other one is per-call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the contexts are still scoped to per attempt, as that is the scope of the TagRPC/HandleRPC calls. The per-call will need to be added later in the interceptor. I plan on adding the per call metrics and all the new tracing stuff at the per call level needed nearer GA date. I decided to delete this metric. It's not used in o11y currently and not defined in Yashes GA metrics, and it doesn't seem useful, as it adds no value.
stats/opencensus/e2e_test.go
Outdated
if vi == nil && vi2 == nil { | ||
return true | ||
} | ||
if (vi != nil) != (vi2 != nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional, nit: if vi == nil || vi2 == nil { return false }
is equivalent and will save a comp and allow short circuiting sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming short circuiting means to exit early from the first comparison since it's an or (and the comparison we can do because of passing the check above). Switched here and elsewhere. (need to add this to my traces PR).
d := &rpcData{ | ||
startTime: time.Now(), | ||
method: info.FullMethodName, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this get moved into setRPCData
since it's common between client & server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure...I have a bit of hesitancy due to the possibility that these fork in logic needed (doesn't seem like it for GA), and even for retry delay per call (in a45 if we ever get to parity) it seems like start time is still scoped correctly as these calls happen per attempt client side. Switched to setRPCData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I tried switching it, and I realized this needs to read info.methodName. I could add it to setRPCData if you'd like and declare an inline struct &rpcData{ startTime: time.Now(), method: string passed in} but keeping it as is seems better from a maintainability perspective. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just thought you'd also pass info
into setRPCData
-- does that not work? Not a big deal, just a minor optional suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I like it as is now. Passing a stats tag rpc struct into a context helper which to me should be to stick a piece of information in the context feels off. The info from this stats tag rpc struct gets converted into rpc data struct, the subset of information from tag rpc + additional (timestamp) needed to successfully recorded measurements throughout the RPC lifecycle. Thus coupling these two types of data where you pass in one to context setter and it converts automatically feels off. Plus, same comment about client/server partition/maintainability over time.
stats/opencensus/stats.go
Outdated
d, ok := ctx.Value(rpcDataKey{}).(*rpcData) | ||
if !ok { | ||
// Shouldn't happen, as gRPC calls TagRPC which populates the rpcData in | ||
// context. | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this into recordRPCData
so it only needs to happen in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swear I had this thought and I made a mental TODO. Switched over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I remember why I didn't do this now. Context is used as an arg in ocstats.RecordWithOptions. However, you can simply read the pointer in recordRPCData, pass just the pointer to the other functions, and pass the pointer plus context to recordDataBegin and recordDataEnd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pass :)!
stats/opencensus/e2e_test.go
Outdated
if vi == nil && vi2 == nil { | ||
return true | ||
} | ||
if (vi != nil) != (vi2 != nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming short circuiting means to exit early from the first comparison since it's an or (and the comparison we can do because of passing the check above). Switched here and elsewhere. (need to add this to my traces PR).
ctx = tag.NewContext(ctx, tags) | ||
} | ||
} | ||
ctx, _ = tag.New(ctx, tag.Upsert(keyServerMethod, removeLeadingSlash(info.FullMethodName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function errors if the tag map in the context is populated with an invalid key or value path or the mutator fails to work. Both of these values are set by tag.Decode earlier (and it not erroring), and tag.Upsert, both from their tag library, and thus the error shouldn't hit. If the error does hit, it means that the tag map does not have the correct KeyServerMethod. This isn't used for any metrics, and is really just populated for the caller to pull out for their own use. This package will continue to record measurements correctly if this error occurs. However, since this won't really ever happen, and this package will continue to record measurements correctly, I think it's best to ignore the error, and return the context with rpc data irregardless if this tag got populated correctly. This code block is logically equialent to what is currently present:
var newContext context.Context
var err error
if newContext, err = tag.New(ctx, tag.Upsert(keyServerMethod, removeLeadingSlash(info.FullMethodName))); err != nil {
return setRPCData(ctx, d)
}
return setRPCData(newContext, d)
stats/opencensus/client_metrics.go
Outdated
} | ||
// ClientServerLatencyView is the distribution of server latency in | ||
// milliseconds per RPC, keyed on method. | ||
ClientServerLatencyView = &view.View{ // TODO: Same recording point as round trip latency, perhaps I should just delete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the contexts are still scoped to per attempt, as that is the scope of the TagRPC/HandleRPC calls. The per-call will need to be added later in the interceptor. I plan on adding the per call metrics and all the new tracing stuff at the per call level needed nearer GA date. I decided to delete this metric. It's not used in o11y currently and not defined in Yashes GA metrics, and it doesn't seem useful, as it adds no value.
stats/opencensus/stats.go
Outdated
bytesDistributionBounds = []float64{1024, 2048, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296} | ||
bytesDistribution = view.Distribution(bytesDistributionBounds...) | ||
millisecondsDistribution = view.Distribution(0.01, 0.05, 0.1, 0.3, 0.6, 0.8, 1, 2, 3, 4, 5, 6, 8, 10, 13, 16, 20, 25, 30, 40, 50, 65, 80, 100, 130, 160, 200, 250, 300, 400, 500, 650, 800, 1000, 2000, 5000, 10000, 20000, 50000, 100000) | ||
countDistributionBounds = []float64{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536} // float64 correct here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md#measures this just lists distributions. As you can see, milliseconds distribution is the one with fractions of integers. Since the others bounds are discrete intervals, and their counts are discrete intervals (that are all positive), I'm going to switch this and bytes distribution to uint64 type.
stats/opencensus/stats.go
Outdated
bytesDistributionBounds = []float64{1024, 2048, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296} | ||
bytesDistribution = view.Distribution(bytesDistributionBounds...) | ||
millisecondsDistribution = view.Distribution(0.01, 0.05, 0.1, 0.3, 0.6, 0.8, 1, 2, 3, 4, 5, 6, 8, 10, 13, 16, 20, 25, 30, 40, 50, 65, 80, 100, 130, 160, 200, 250, 300, 400, 500, 650, 800, 1000, 2000, 5000, 10000, 20000, 50000, 100000) | ||
countDistributionBounds = []float64{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536} // float64 correct here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, sorry, all their functions (view.Distribution), and the bounds they pass in Export View are all of float64 type, thus I must leave as such.
codes/codes.go
Outdated
@@ -195,6 +195,27 @@ const ( | |||
_maxCode = 17 | |||
) | |||
|
|||
// CodeToStr maps from the code type to strings with only uppercase letters. | |||
var CodeToStr = map[Code]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this now
stats/opencensus/stats.go
Outdated
s, _ := status.FromError(e.Error) // ignore second argument because codes.Unknown is fine | ||
st = codes.CodeToStr[s.Code()] | ||
s, _ := status.FromError(e.Error) | ||
st = internal.CanonicalString.(func(codes.Code) string)(s.Code()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do this cast once globally:
var canonicalString = internal.CanonicalString.(func(codes.Code) string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Curious: why do you want this cast globally? I have similar typecasts in the gcp/observability module (GlobalDialOptions/ServerOptions plumbed through internal/), and I use them more than once, but you never mentioned it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming it's so if typecast fails, fails at init time rather than runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the big reason. Also, to avoid doing the cast every time you call it, since it isn't free.
ctx = tag.NewContext(ctx, tags) | ||
} | ||
} | ||
ctx, _ = tag.New(ctx, tag.Upsert(keyServerMethod, removeLeadingSlash(info.FullMethodName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the error does hit, it means that the tag map does not have the correct KeyServerMethod. This isn't used for any metrics, and is really just populated for the caller to pull out for their own use
If it returns an error, is the context it returns valid? Or does it return nil
? If the latter, then we need to check the error just in case.
d := &rpcData{ | ||
startTime: time.Now(), | ||
method: info.FullMethodName, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just thought you'd also pass info
into setRPCData
-- does that not work? Not a big deal, just a minor optional suggestion.
stats/opencensus/e2e_test.go
Outdated
for _, test := range tests { | ||
t.Run(test.name, func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel that strongly, but you're running the same test code multiple times and just checking different results, which seems strange to me. If it fails, you know what was wrong by what was different.
stats/opencensus/stats.go
Outdated
bytesDistributionBounds = []float64{1024, 2048, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296} | ||
bytesDistribution = view.Distribution(bytesDistributionBounds...) | ||
millisecondsDistribution = view.Distribution(0.01, 0.05, 0.1, 0.3, 0.6, 0.8, 1, 2, 3, 4, 5, 6, 8, 10, 13, 16, 20, 25, 30, 40, 50, 65, 80, 100, 130, 160, 200, 250, 300, 400, 500, 650, 800, 1000, 2000, 5000, 10000, 20000, 50000, 100000) | ||
countDistributionBounds = []float64{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536} // float64 correct here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG; I was just wanting you to follow-up on your question/comment and delete it. :)
This PR adds OpenCensus metrics to gRPC Go. The e2e tests all test at the exporter level and expect certain metrics emissions from RPC's being performed. Based on #5919.
RELEASE NOTES: