-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Jaeger's GRPC handler to flow the tenant from an HTTP header #3688
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3688 +/- ##
==========================================
- Coverage 97.49% 97.45% -0.05%
==========================================
Files 269 271 +2
Lines 15934 16014 +80
==========================================
+ Hits 15535 15606 +71
- Misses 315 321 +6
- Partials 84 87 +3
Continue to review full report at Codecov.
|
} | ||
md, ok := metadata.FromIncomingContext(ctx) | ||
if !ok { | ||
return "", status.Errorf(codes.Internal, "missing metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it an error to be missing metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Google's GRPC implementation there is always metadata, and I was thinking this is a 500/unexpected situation.
This code path is hit when Jaeger has been deployed with a requirement for a tenant header, and GRPC hasn't supplied metadata.
There is no security-related reason to treat missing metadata as different from valid metadata lacking the tenant header. Shall I rework this to combine missing-metadata and missing-header handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, but then I would not treat it as codes.Internal
, but similar to other ones and I would alter the error message, e.g. "missing metadata, tenant header is required"
cmd/collector/app/builder_flags.go
Outdated
@@ -40,6 +40,7 @@ const ( | |||
collectorGRPCMaxReceiveMessageLength = "collector.grpc-server.max-message-size" | |||
collectorMaxConnectionAge = "collector.grpc-server.max-connection-age" | |||
collectorMaxConnectionAgeGrace = "collector.grpc-server.max-connection-age-grace" | |||
collectorTenantHeader = "collector.tenant-header" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gRPC is not the only way to submit data to collector. Is the intention to require the same header name across transports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro I'll need your advice on this. I have only used the GRPC transport. I had assumed that most users only use a single transport. If some users enable multiple transports, and wish to use different tenancy headers, I am happy to change it.
I'm expecting to implement flowing the header from HTTP to the internal tenant for GRPC only, as that is the only transport I use and know how to test. I hope to leave the other transports as a follow-on for others who desire tenancy on the other transports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware of any existing header conventions for the tenant? Is x-tenant
suitable to both HTTP and gRPC?
My suggestion:
- keep a single header for now, we can extend this in the future if needed
- I would like to have a choke point in the pipeline that would know that a tenancy is required and would verify that the Context contains a valid tenant. This way, even if we don't implement tenancy in HTTP headers immediately, the system would still be otherwise "safe", i.e. it would not accept data that is missing the tenant value.
The last point requires a configuration, which I would suggest to consolidate, e.g. something like
multi_tenancy:
enabled: true
header: "x-tenant" # this could be the default value instead of always requiring it
tenants: ["a", "b", "c"] # required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specify tenants at the deployment time? I wanted to avoid restarting Jaeger (and changing deployments) when a new tenant is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration can be extended to accept tenants in other forms, like watching a file or by integrating with another source. This would be a good opportunity to start introducing a control plane where tenants can be provided externally via api call, such that the integration with the source can be decoupled from Jaeger code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose tackling the tenant restrictions in a separate PR.
More sophisticated deployments should do tenant validation in the auth layer (proxy). Which makes me think do we even need Jaeger to do this validation? A valid requirement for predefined tenants might be if the storage implementation needs to do some initialization per tenant. In the case of ES it should not be required, but maybe other storages like C* need to create a new DB that cannot be done in-flight of the first store call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation I'm most concerned about is not whether tenant is valid, but whether it is sent at all. Without that we open a hole that could allow other tenants to read your data.
I agree that there needs to be a stronger authentication somewhere other than a header, which is trivial to spoof.
@jkowall any experience to share here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will relax the handling of valid tenants so that if multi_tenancy.tenants
is undefined any tenant will pass. This will allow the spanwriter to reject spans. In the future we can provide a way for the writer to reject earlier.
Storage might have rules about what a valid tenant can be that we can't express on a list. For example, some storage might reject tenants that include non-ASCII characters. Or some storage might want to hot-enable/disable a tenant.
cec2799
to
c6db258
Compare
pkg/config/tenancy/flags.go
Outdated
validTenants = "multi_tenancy.tenants" | ||
) | ||
|
||
// TenancyFlagsConfig describes which CLI flags for TLS client should be generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type does not seem necessary, we could just have global functions. Or, at least, it should be the same type as Options, that's the pattern we used, i.e. the config/options type exposes functions to initialize itself via CLI.
pkg/config/tenancy/tenancy.go
Outdated
tenants map[string]bool | ||
} | ||
|
||
func (tl tenantList) Valid(candidate string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure you want by-value receiver here? There's going to be a heap allocation anyway when the struct is converted into Guard interface.
var tenant string | ||
if g.tenancyConfig.Enabled { | ||
var err error | ||
tenant, err = g.validateTenant(ctx) | ||
if err != nil { | ||
g.logger.Error("rejecting spans (tenancy)", zap.Error(err)) | ||
return nil, err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest moving this to validateTenant
var tenant string | |
if g.tenancyConfig.Enabled { | |
var err error | |
tenant, err = g.validateTenant(ctx) | |
if err != nil { | |
g.logger.Error("rejecting spans (tenancy)", zap.Error(err)) | |
return nil, err | |
} | |
} | |
tenant, err := g.validateTenant(ctx) | |
if err != nil { | |
g.logger.Error("failed to validate tenant, rejecting spans", zap.Error(err)) | |
return nil, err | |
} |
pkg/config/tenancy/tenancy.go
Outdated
type TenancyConfig struct { | ||
Enabled bool | ||
Header string | ||
Valid Guard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks weird in the usage: config.Valid.Valid()
. How about:
type TenancyConfig struct {
...
guard Guard
}
func (c *TenancyConfig) Valid(v string) bool {
return c.guard.Valid()
}
especially since the current Valid variable is force-assigned in the constructor & cannot be user-supplied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
9532548
to
ecba97d
Compare
pkg/config/tenancy/flags.go
Outdated
func AddFlags(flags *flag.FlagSet) { | ||
flags.Bool(tenancyEnabled, false, "Enable tenancy header when receiving or querying") | ||
flags.String(tenancyHeader, "x-tenant", "HTTP header carrying tenant") | ||
flags.String(validTenants, "", "Acceptable tenants") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should mention it is a comma-separated list. What if the flag is not defined?
}, | ||
} | ||
for _, test := range tests { | ||
_, err := client.PostSpans(test.ctx, &api_v2.PostSpansRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the individual test cases should be executed separately via t.Run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the format of TestPostSpans()
. No tests in cmd/collector/app/handler use t.Run()
.
Anyway, I made the changes.
Tenants: validTenants, | ||
})) | ||
for _, test := range tests { | ||
tenant, err := handler.validateTenant(test.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same here, test cases should be run via t.Run
// AddFlags adds flags for tenancy to the FlagSet. | ||
func AddFlags(flags *flag.FlagSet) { | ||
flags.Bool(tenancyEnabled, false, "Enable tenancy header when receiving or querying") | ||
flags.String(tenancyHeader, "x-tenant", "HTTP header carrying tenant") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first version, we could have a single flag tenant-header
which would enable tenancy if it is defined.
Internally we will most likely not use the predefined list of tenants (e.g. in the observatorium project) and if the storage layer does not need to have access to the tenant names I find this configuration redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to define flags more forward looking than just what works with the first version. Using .enabled is already an established pattern in Jaeger flags.
Signed-off-by: Ed Snible <[email protected]>
0c79158
to
e982edf
Compare
Signed-off-by: Ed Snible <[email protected]>
I struggled with re-basing after #3707 . Instead I copied the code changes and tests and manually applied them to main. |
batch := &r.Batch | ||
err := g.batchConsumer.consume(batch) | ||
err = g.batchConsumer.consume(batch, tenant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest changing the signature to consume(context, batch)
and handling tenancy inside. This way it can be reused in OTLP receiver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ok to do in another PR if you prefer)
…aegertracing#3688) * Flow tenant from GRPC PostSpans header through processors Signed-off-by: Ed Snible <[email protected]> * Restore accidentally deleted test case Signed-off-by: Ed Snible <[email protected]> Signed-off-by: Albert Teoh <[email protected]>
Signed-off-by: Ed Snible [email protected]
If Jaeger is started with for example
--collector.tenant-header=x-tenant
, flow the value of the x-tenant header to the tenant parameter for span processing.If Jaeger is not started with
--collector.tenant-header
, this is essentially a no-op.cc @pavolloffay