Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tenancy for queries #3791

Merged
merged 20 commits into from
Jul 21, 2022
Merged

Conversation

esnible
Copy link
Contributor

@esnible esnible commented Jul 1, 2022

Signed-off-by: Ed Snible [email protected]

This PR replaces #3735

  • This PR flows the tenant header through the query APIs.
  • This PR blocks requests with invalid tenant or missing tenant header from reaching the query APIs.
  • This PR does NOT do authorization. It is assumed authorization is handled by a reverse proxy or other non-Jaeger component at this time.
  • This PR requires actual distinguishing of the tenant to be done by the storage layer. The purpose of this PR is to be sure the storage layer receives a tenant that is valid when Jaeger is started with --multi_tenant.enabled=true.

cc @pavolloffay

@esnible esnible requested a review from a team as a code owner July 1, 2022 15:46
@esnible esnible requested a review from yurishkuro July 1, 2022 15:46
"github.com/jaegertracing/jaeger/pkg/config/tlscfg"
"github.com/jaegertracing/jaeger/proto-gen/api_v3"
)

// RegisterGRPCGateway registers api_v3 endpoints into provided mux.
func RegisterGRPCGateway(ctx context.Context, logger *zap.Logger, r *mux.Router, basePath string, grpcEndpoint string, grpcTLS tlscfg.Options) error {
return RegisterGRPCGatewayWithTenancy(ctx, logger, r, basePath, grpcEndpoint, grpcTLS,
tenancy.Options{
Enabled: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this hardcoded? It deserves a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavolloffay I thought it might make sense to introduce a new method rather than a new parameter. Would it be better if RegisterGRPCGateway() took a tenancy.Options param?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making this API more configurable seems better. The function has also a lot of parameters, maybe having a params struct would simplify it as well.

@codecov
Copy link

codecov bot commented Jul 1, 2022

Codecov Report

Merging #3791 (0d02721) into main (3f71e0c) will increase coverage by 0.03%.
The diff coverage is 97.24%.

@@            Coverage Diff             @@
##             main    #3791      +/-   ##
==========================================
+ Coverage   97.56%   97.59%   +0.03%     
==========================================
  Files         289      291       +2     
  Lines       16800    16913     +113     
==========================================
+ Hits        16391    16507     +116     
+ Misses        323      321       -2     
+ Partials       86       85       -1     
Impacted Files Coverage Δ
cmd/collector/app/flags/flags.go 96.80% <ø> (ø)
pkg/tenancy/context.go 100.00% <ø> (ø)
pkg/tenancy/flags.go 100.00% <ø> (ø)
cmd/query/app/flags.go 95.38% <40.00%> (-4.62%) ⬇️
cmd/query/app/server.go 94.61% <92.30%> (+0.17%) ⬆️
cmd/collector/app/collector.go 87.90% <100.00%> (+0.19%) ⬆️
cmd/collector/app/handler/grpc_handler.go 100.00% <100.00%> (ø)
cmd/collector/app/handler/otlp_receiver.go 100.00% <100.00%> (ø)
cmd/collector/app/span_handler_builder.go 100.00% <100.00%> (ø)
cmd/collector/app/span_processor.go 100.00% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f71e0c...0d02721. Read the comment docs.

@esnible esnible force-pushed the flow-tenant-from-query-v4 branch from 9304764 to 543259c Compare July 5, 2022 13:29
if len(tenants) < 1 {
return "", status.Errorf(codes.PermissionDenied, "missing tenant header")
} else if len(tenants) > 1 {
return "", status.Errorf(codes.PermissionDenied, "extra tenant header")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sohuld we be more explicit and put the value into the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What values are you thinking? Typically PermissionDenied messages are cryptic, as they often get sent to the client, who is only supposed to know enough to fix the problem, but not enough to guess how to get around the security.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put there the list of tenants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a security no-no. We can't say 'You identified as tenant "dummy", to use this site you must represent yourself as "ibm", "redhat", or "google".' If we give any information as part of PermissionDenied we will fail any kind of security audit or pen test.

if tenant == "" {
return emptyMD
}
return metadata.New(map[string]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with MetadataAnnotator and therefore I am not sure if this is correct or the map should contain other initialized values.

How often is this called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /v3 HTTP API is implemented by calling the gRPC implementation. I believe you did this work. MetadataAnnotator() is a new middleware that will be called once per v3 query.

I believe github.com/grpc-ecosystem/grpc-gateway, or possibly Go's GRPC itself, takes care of merging all gRPC metadata together, so there shouldn't be a concern about other values.

If you are concerned about a map allocation per query these could be cached by the middleware.

@esnible esnible force-pushed the flow-tenant-from-query-v4 branch from 543259c to d7e2528 Compare July 5, 2022 18:57
pavolloffay
pavolloffay previously approved these changes Jul 7, 2022
return tss.context
}

// NewGuardingStreamInterceptor blocks handling of streams whose tenancy header isn't doesn't meet tenancy requirements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// NewGuardingStreamInterceptor blocks handling of streams whose tenancy header isn't doesn't meet tenancy requirements
// NewGuardingStreamInterceptor blocks handling of streams whose tenancy header doesn't meet tenancy requirements.

@@ -35,9 +35,14 @@ type Options struct {

// NewTenancyConfig creates a tenancy configuration for tenancy Options
func NewTenancyConfig(options *Options) *TenancyConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file would be better named config.go

return func(ctx context.Context, req *http.Request) metadata.MD {
tenant := req.Header.Get(tc.Header)
if tenant == "" {
return emptyMD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels dangerous, isn't it possible to have some other interceptor etc. in the chain that will try to modify metadata, such that the singleton won't be appropriate?

is it possible to simply return nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't pass nil because grpc-gateway's runtime.WithMetadata() expects a function that takes an MD, not a *MD. With the current implementation of grpc-gateway returning a singleton isn't a problem, but there is no guarantee of that, and I will pass along a new MD. This is only called if tenancy is enabled, and it will be a rare case that tenancy is enabled and clients aren't sending tenancy information.

@@ -0,0 +1,73 @@
// Copyright (c) 2022 The Jaeger Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's weird that we have interceptors and other functionality within package config/*. If config is only one of tenancy-related functions, then I would suggest:

pkg/
    tenancy/
        config.go (for TenancyConfig and validators)
        context.go (instead of being under storage/)
        grpc.go (for grpc-related pieces)
        http.go (for http-related pieces)

// PropagationHandler returns a http.Handler containing the logic to extract
// the tenancy header of the http.Request and insert the tenant into request.Context
// for propagation. The token can be accessed via storage.GetTenant().
func (tc *TenancyConfig) PropagationHandler(logger *zap.Logger, h http.Handler) http.Handler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see much value in this function having TC as a receiver - the functionality is not related to "config".

Suggestion:

  • move to http.go
  • name ExtractTenantHTTPHandler

runtime.WithMarshalerOption(runtime.MIMEWildcard, jsonpb),
)
}
tc := tenancy.NewTenancyConfig(&tenancyOptions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we transform options to config here instead of doing once elsewhere and the reusing config across different handlers?

@@ -160,7 +161,8 @@ func createHTTPServer(querySvc *querysvc.QueryService, metricsQuerySvc querysvc.
}

ctx, closeGRPCGateway := context.WithCancel(context.Background())
if err := apiv3.RegisterGRPCGateway(ctx, logger, r, queryOpts.BasePath, queryOpts.GRPCHostPort, queryOpts.TLSGRPC); err != nil {
tc := tenancy.NewTenancyConfig(&queryOpts.Tenancy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the API handler above, I assume it should also be wrapped in ExtractTenant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. This comment was extremely helpful and identified several flaws.

I have fixed the problems, but please don't merge yet! ;-)

The problems were not detected for two reasons. I was testing with GetTrace, a streaming RPC, and ignoring tenancy for GetService, a unary RPC. This PR needs unary RPC tests. I have added one but need to increase coverage.
The other problem is that the tests set up the server using newGRPCServer() in cmd/query/app/grpc_handler_test.go, but the actual Jaeger uses createGRPCServer() which is nearly the same in implementation but was not tested for tenancy and didn't work.

I want to do at least one more commit of additional tests.

cc @pavolloffay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with newGRPCServer nor createGRPCServer but if there is a way to unify these APIs we should consider doing it (e.g. in a follow up PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tc := tenancy.NewTenancyConfig(&queryOpts.Tenancy)

there are way too many places doing this conversion - we should only create tenancy config once (in the future it could be "live" object that self-updates).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is a design issue here.

  • Tenancy options and tenancy config are semantically similar concepts. We're using config not as a config, but as a "service", like "tenancy manager" or "tenancy checker"
  • This service is a runtime dependency of various API handlers, and in a clean design handlers should never be creating their dependencies, especially when the dependency is shared across so many handlers - it should be created once, close to the main function, and passed to constructors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro. I want to get this right.

I was using TenancyOptions for something human-readable, for Viper, and TenancyConfig for something machine readable, optimized for fast checking, following the pattern of tlscfg.Options and tls.Config. Would it make sense to change tenency.TenancyConfig to tenancy.TenancyManager?

Currently the PR converts tenancy.Options to TenancyConfig in 4 non-test places. Twice in createGRPCServer() -- I will fix that, once in createHTTPServer(), and once in BuildHandlers() (already merged; for trace ingestion). Currently TenancyConfig is lightweight, and I didn't want to change the signatures to the create*Server() functions. If we think of Tenancy is a "service" then I should create it once. The next commit will address this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TenancyConfig is different from tls.Config in that the former is an object with behavior (an internal "service"), but the latter is just data passed elsewhere. I think from the naming it would be clearer if it was called tenancy.TenancyManager.

Currently TenancyConfig is lightweight

It's only lightweight in the current implementation. Because it's a service, it can become arbitrarily complicated in the future, e.g. it could start refreshing the list of tenants from some storage, etc. By instantiating it in many places we introduce technical debt from the start that will need to be cleaned up later, so I'd rather fix it now and create this service once.


var response structuredResponse
err := getJSON(ts.server.URL+`/api/traces?traceID=1&traceID=2`, &response)
assert.Error(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I usually prefer more explicit check like

require.Error(t, err)
assert.Contains(t, err.Error(), "some text specific to tenancy error")

@esnible
Copy link
Contributor Author

esnible commented Jul 11, 2022

@pavolloffay I have added extensive tests. Please re-review.

pavolloffay
pavolloffay previously approved these changes Jul 12, 2022
@@ -219,7 +228,7 @@ func initializeTestServerGRPCWithOptions(t *testing.T, options ...testOption) *g
logger := zap.NewNop()
tracer := opentracing.NoopTracer{}

server, addr := newGRPCServer(t, q, tqs.metricsQueryService, logger, tracer)
server, addr := newGRPCServer(t, q, tqs.metricsQueryService, logger, tracer, &tenancy.TenancyConfig{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be tenancyConfig passed instead of creating a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now fixed, but of course that dismissed your review....

pavolloffay
pavolloffay previously approved these changes Jul 12, 2022
@esnible
Copy link
Contributor Author

esnible commented Jul 12, 2022

@Pavol Rebased, this dismissed your positive review...

pavolloffay
pavolloffay previously approved these changes Jul 12, 2022
// PropagationHandler returns a http.Handler containing the logic to extract
// the tenancy header of the http.Request and insert the tenant into request.Context
// for propagation. The token can be accessed via tenancy.GetTenant().
func ExtractTenantHTTPHandler(tc *TenancyConfig, h http.Handler, logger *zap.Logger) http.Handler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logger is not used, let's remove it

var handler http.Handler
handler = http.HandlerFunc(f)
if aH.tenancyConfig.Enabled {
handler = tenancy.ExtractTenantHTTPHandler(aH.tenancyConfig, handler, zap.NewNop())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
handler = tenancy.ExtractTenantHTTPHandler(aH.tenancyConfig, handler, zap.NewNop())
handler = tenancy.ExtractTenantHTTPHandler(aH.tenancyConfig, handler, ah.logger)

@@ -61,6 +62,13 @@ func (handlerOptions) QueryLookbackDuration(queryLookbackDuration time.Duration)
}
}

// Tracer creates a HandlerOption that initializes tenancy
func (handlerOptions) Tenancy(options *tenancy.Options) HandlerOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pass options instead of tenancy.Config?

func NewGuardingUnaryInterceptor(tc *TenancyConfig) grpc.UnaryServerInterceptor {
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
// Handle case where tenant is directly in the context
tenant := GetTenant(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code below is identical to the previous function, please move to a single helper function like

tenant, err := GetAndValidateTenant(ctx)
if err != nil { return err }
if tenant == "" {
  return handler()
} else {
  return wrappedHandler()
}

esnible added 18 commits July 15, 2022 16:30
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Ed Snible <[email protected]>
@esnible esnible force-pushed the flow-tenant-from-query-v4 branch from b5458ad to 156008a Compare July 15, 2022 20:30
@esnible
Copy link
Contributor Author

esnible commented Jul 20, 2022

@pavolloffay Anything more needed on this PR?

@@ -94,6 +95,7 @@ func (c *Collector) Start(options *flags.CollectorOptions) error {
CollectorOpts: options,
Logger: c.logger,
MetricsFactory: c.metricsFactory,
TenancyMgr: tenancy.NewTenancyManager(&options.GRPC.Tenancy),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when collector is instantiated from all-in-one, will this cause multiple tenancy managers to exist?

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is mostly ok to merge, but I would still prefer it is cleaned up to only instantiate tenancy manager in main's. I see at least two places where it's being created adhoc in the components that should instead be receiving it as a dependency.

@esnible
Copy link
Contributor Author

esnible commented Jul 20, 2022

@yurishkuro Thank you for your patience. I was timid. With the latest commit, only the main.go and test cases create TenancyManager. Thus there will now only be one of them even in the all-in-one case.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@yurishkuro yurishkuro enabled auto-merge (squash) July 21, 2022 00:32
@yurishkuro yurishkuro merged commit ddca3c8 into jaegertracing:main Jul 21, 2022
locmai pushed a commit to locmai/jaeger that referenced this pull request Jul 25, 2022
* Tenancy for queries

Signed-off-by: Ed Snible <[email protected]>

* New parameter for RegisterGRPCGateway() test function

Signed-off-by: Ed Snible <[email protected]>

* More tests that are local to package itself

Signed-off-by: Ed Snible <[email protected]>

* Additional test cases to raise test coverage

Signed-off-by: Ed Snible <[email protected]>

* Fix test

Signed-off-by: Ed Snible <[email protected]>

* spelling

Signed-off-by: Ed Snible <[email protected]>

* Rename file

Signed-off-by: Ed Snible <[email protected]>

* Refactor tenancy packages

Signed-off-by: Ed Snible <[email protected]>

* restore empty_test.go as part of refactoring tenancy out of storage

Signed-off-by: Ed Snible <[email protected]>

* lint/gofumpt

Signed-off-by: Ed Snible <[email protected]>

* Enforce tenancy on non-streaming gRPC and add additional tests

Signed-off-by: Ed Snible <[email protected]>

* Test for tenant flow to storage for both streaming and unary RPC

Signed-off-by: Ed Snible <[email protected]>

* HTTP tenancy test

Signed-off-by: Ed Snible <[email protected]>

* Unit test for unary tenancy handler

Signed-off-by: Ed Snible <[email protected]>

* Factor out rendundent test function

Signed-off-by: Ed Snible <[email protected]>

* Address review comments

Signed-off-by: Ed Snible <[email protected]>

* Error name

Signed-off-by: Ed Snible <[email protected]>

* Refactor TenancyConfig to TenancyManager

Signed-off-by: Ed Snible <[email protected]>

* Address review comments

Signed-off-by: Ed Snible <[email protected]>

* Refactor so that NewTenancyManager() only called from main and tests

Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Loc Mai <[email protected]>
albertteoh pushed a commit to albertteoh/jaeger that referenced this pull request Aug 8, 2022
* Tenancy for queries

Signed-off-by: Ed Snible <[email protected]>

* New parameter for RegisterGRPCGateway() test function

Signed-off-by: Ed Snible <[email protected]>

* More tests that are local to package itself

Signed-off-by: Ed Snible <[email protected]>

* Additional test cases to raise test coverage

Signed-off-by: Ed Snible <[email protected]>

* Fix test

Signed-off-by: Ed Snible <[email protected]>

* spelling

Signed-off-by: Ed Snible <[email protected]>

* Rename file

Signed-off-by: Ed Snible <[email protected]>

* Refactor tenancy packages

Signed-off-by: Ed Snible <[email protected]>

* restore empty_test.go as part of refactoring tenancy out of storage

Signed-off-by: Ed Snible <[email protected]>

* lint/gofumpt

Signed-off-by: Ed Snible <[email protected]>

* Enforce tenancy on non-streaming gRPC and add additional tests

Signed-off-by: Ed Snible <[email protected]>

* Test for tenant flow to storage for both streaming and unary RPC

Signed-off-by: Ed Snible <[email protected]>

* HTTP tenancy test

Signed-off-by: Ed Snible <[email protected]>

* Unit test for unary tenancy handler

Signed-off-by: Ed Snible <[email protected]>

* Factor out rendundent test function

Signed-off-by: Ed Snible <[email protected]>

* Address review comments

Signed-off-by: Ed Snible <[email protected]>

* Error name

Signed-off-by: Ed Snible <[email protected]>

* Refactor TenancyConfig to TenancyManager

Signed-off-by: Ed Snible <[email protected]>

* Address review comments

Signed-off-by: Ed Snible <[email protected]>

* Refactor so that NewTenancyManager() only called from main and tests

Signed-off-by: Ed Snible <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants